read and write binary files in python

read and write binary files in python - python

I have a binary file called "input.bin".
I am practicing how to work with such files (read them, change the content and write into a new binary file).
the contents of input file:
03 fa 55 12 20 66 67 50 e8 ab
which is in hexadecimal notation.
I want to make a output file which is simply the input file with the value of each byte incremented by one.
here is the expected output:
04 fb 56 13 21 67 68 51 e9 ac
which also will be in hexadecimal notation.
I am trying to do that in python3 using the following command:
with open("input.bin", "rb") as binary_file:
data = binary_file.read()
for item in data:
item2 = item+1
with open("output.bin", "wb") as binary_file2:
binary_file2.write(item2)
but it does not return what I want. do you know how to fix it?

You want to open the output file before the loop, and call write in the loop.
with open("input.bin", "rb") as binary_file:
data = binary_file.read()
with open("output.bin", "wb") as binary_file2:
binary_file2.write(bytes(item - 1 for item in data))

Related

python read binary file byte by byte and read line feed as 0a

This might seem pretty stupid, but I'm a complete newbie in python.
So, I have a binary file that holds data as
47 40 ad e8 66 29 10 87 d7 73 0a 40 10
When I tried to read it with python by
with open(absolutePathInput, "rb") as f:
for line in self.file:
for byte, nextbyte in zip(line[:], line[1:]):
if state == 'wait_for_sync_1':
if ((byte == 0x10) and (nextbyte == 0x87)):
state = 'message_id'
I get
all bytes but 0a is read as line feed(i.e. \n)
It seems that it considers as "line feed" and self.file reads only till 0a.
correct me to read 0a as hex value

How to read double, float and int values from binary files in python?

I have a binary file that was created in C++. The first value is double and the second is integer. I am reading the values fine using the following code in C++.
double dob_value;
int integer_value;
fread(&dob_value, sizeof(dob_value), 1, fp);
fread(&integer_value, sizeof(integer_value), 1, fp);
I am trying to read the same file in python but I am running into issues. My dob_value is 400000000.00 and my integer_value 400000. I am using following code in python for double.
def interpret_float(x):
return struct.unpack('d',x[4:]+x[:4])
with open(file_name, 'rb') as readfile:
dob = readfile.read(8)
dob_value = interpret_float(dob)[0]
val = readfile.read(4)
test2 = readfile.read(4)
integer_value = int.from_bytes(test2, "little")
My dob_value is 400000000.02384186 . My question is where is this extra decimals coming from? Also, how do I get the correct integer_value? With above code, my integer_value is 1091122467. I also have float values after integer but I haven't looked into that yet.

If the link goes broken and just in case the test.bin contains 00 00 00 00 84 D7 B7 41 80 1A 06 00 70 85 69 C0.
Your binary contains correct 41B7D78400000000 hexadecimal representation of 400000000.0 in the first 8 bytes. Running
import binascii
import struct
fname = r'test.bin'
with open(fname, 'rb') as readfile:
dob = readfile.read(8)
print(struct.unpack('d', dob)[0])
print(binascii.hexlify(dob))
outputs
>> 400000000.0
>> b'0000000084d7b741'
which is also correct little endian representation of the double. When you swap parts, you get
print(binascii.hexlify(dob[4:]+dob[:4]))
>> b'84d7b74100000000'
and if you check the decimal value, it will give you 5.45e-315, not what you expect. Moreover,
struct.unpack('d', dob[4:]+dob[:4])[0]
>>5.44740625e-315
So I'm not sure how you could get 400000000.02384186 from the code above. However, to obtain 400000000.02384186 using your test.bin, just skip the four bytes in the beginning:
with open(fname, 'rb') as readfile:
val = readfile.read(4)
dob = readfile.read(8)
dob = dob[4:]+dob[:4]
print(binascii.hexlify(dob))
print(struct.unpack('d', dob)[0])
>>b'801a060084d7b741'
>>400000000.02384186
Binary value 0x41B7D78400061A80 corresponds to 400000000.02384186. So you first read incorrect bytes, then incorrectly swap parts and get a result close to what you expect. Considering integer value, the 400000 is 0x00061A80, which is also present in the binary, but you definitely read past that bytes, since you used them for double, so you get wrong values.

How to read hex values at specific addresses in Python?

Say I have a file and I'm interested in reading and storing hex values at certain addresses, like the snippet below:
22660 00 50 50 04 00 56 0F 50 25 98 8A 19 54 EF 76 00
22670 75 38 D8 B9 90 34 17 75 93 19 93 19 49 71 EF 81
I want to read the value at 0x2266D, and be able to replace it with another hex value, but I can't understand how to do it. I've tried using open('filename', 'rb'), however this reads it as the ASCII representation of the values, and I don't see how to pick and choose when addresses I want to change.
Thanks!
Edit: For an example, I have
rom = open("filename", 'rb')
for i in range(5):
test = rom.next().split()
print test
rom.close()
This outputs: ['NES\x1a', 'B\x00\x00\x00\x00\x00\x00\x00\x00\x00!\x0f\x0f\x0f(\x0f!!', '!\x02\x0f\x0f', '!\x0f\x01\x08', '!:\x0f\x0f\x03!', '\x0f', '\x0f\x0f', '!', '\x0f\x0f!\x0f\x03\x0f\x12', '\x0f\x0f\x0f(\x02&%\x0f', '\x0f', '#', '!\x0f\x0f1', '!"#$\x14\x14\x14\x13\x13\x03\x04\x0f\x0f\x03\x13#!!\x00\x00\x00\x00\x00!!', '(', '\x0f"\x0f', '#\x14\x11\x12\x0f\x0f\x0f#', '\x10', "5'4\x0270&\x02\x02\x02\x02\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f126&\x13\x0f\x0f\x0f\x13&6222\x0f", '\x1c,', etc etc.
Much more than 5 bytes, and while some of it is in hex, some has been replaced with ASCII.

There's no indication that some of the bytes were replaced by their ASCII representations. Some bytes happen to be printable.
With a binary file, you can simply seek to the location offset and write the bytes in. Working with the line-iterator in the case of binary file is problematic, as there's no meaningful "lines" in the binary blob.
You can do in-place editing like follows (in fake Python):
with open("filename", "rb+") as f:
f.seek(0x2266D)
the_byte = f.read(1)
if len(the_byte) != 1:
# something's wrong; bolt out ...
else
transformed_byte = your_function(the_byte)
f.seek(-1, 1) # back one byte relative to the current position
f.write(transformed_byte)
But of course, you may want to do the edit on a copy, either in-memory (and commit later, as in the answer of #JosepValls), or on a file copy. The problem with gulping the whole file in memory is, of course, sometimes the system may choke ;) For that purpose you may want to mmap part of the file.

Given that is not a very big file (roms should fit fine in today's computer's memory), just do data = open('filename', 'rb').read(). Now you can do whatever you want to the data (if you print it, it will show ascii, but that is just data!). Unfortunately, string objects don't support item assignment, see this answer for more:
Change one character in a string in Python?
In your case:
data = data[0:0x2266C] + str(0xFF) + data[0x2266D:]

Python search for a start hex string, save the hex string after that to a new file

PYTHON
I would like to open a text file, search for a HEX pattern that starts with 45 3F, grab the following hex 6 HEX values, for example 45 4F 5D and put all this in a new file. I also know that it always ends with 00 00.
So the file can look like: bla bla sdsfsdf 45 3F 08 DF 5D 00 00 dsafasdfsadf 45 3F 07 D3 5F 00 00 xztert
And should be put in a new file like this:
08 DF 5D
07 D3 5F
How can I do that?
I have tried:
output = open('file.txt','r')
my_data = output.read()
print re.findall(r"45 3F[0-9a-fA-F]", my_data)
but it only prints:
[]
Any suggestions?
Thank you :-)

Here's a complete answer (Python 3):
with open('file.txt') as reader:
my_data = reader.read()
matches = re.findall(r"45 3F ([0-9A-F]{2} [0-9A-F]{2} [0-9A-F]{2})", my_data)
data_to_write = " ".join(matches)
with open('out.txt', 'w') as writer:
writer.write(data_to_write)
re.findall returns the capturing group, so there's no need to clear out '45 3F'.
When dealing with files, use with open so that the file descriptor will be released when indentation ends.
if you want to accept the first two octets dynamically:
search_string = raw_input()
regex_pattern = search_string.upper() + r" ([0-9A-F]{2} [0-9A-F]{2} [0-9A-F]{2})"
matches = re.findall(regex_pattern, my_data)

Reading and writing binary data of dynamic sizes question

I am trying to read data from a file in binary mode and manipulate that data.
try:
resultfile = open("binfile", "rb")
except:
print "Error"
resultsize = os.path.getsize("binfile")
There is a 32 byte header which I parse fine then the buffer of binary data starts. The data can be of any size from 16 to 4092 and can be in any format from text to a pdf or image or anything else. The header has the size of the data so to get this information I do
contents = resultfile.read(resultsize)
and this puts the entire file into a string buffer. I found out this is probably my problem because when I try to copy chunks of the hex data from "contents" into a new file some bytes do not copy correctly so pdfs and images will come out corrupted.
Printing out a bit of the file string buffer in the interpreter yields for example something like "%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n" when I just want the bytes themselves in order to write them to a new file. Is there an easy solution to this problem that I am missing?
Here is an example of a hex dump with the pdf written with my python and the real pdf:
25 50 44 46 2D 31 2E 35 0D 0D 0A 25 B5 B5 B5 B5 0D 0D 0A 31 20 30 20 6F 62 6A 0D 0D 0A
25 50 44 46 2D 31 2E 35 0D 0A 25 B5 B5 B5 B5 0D 0A 31 20 30 20 6F 62 6A
It seems like a 0D is being added whenever there is a 0D 0A. In image files it might be a different byte, I don't remember and might have to test it.
My code to write the new file is pretty simple, using contents as the string buffer holding all the data.
fbuf = contents[offset+8:size+offset]
fl = open(fname, 'a')
fl.write(fbuf)
This is being called in a loop based on a signature that is found in the header. Offset+8 is beginning of actual pdf data and size is the size of the chunk to copy.

You need to open your output file in binary mode, as you do your input file. Otherwise, newline characters may get changed. You can see that this is what happens in your hex dump: 0A characters ('\n') are changed into OD 0A ('\r\n').
This should work:
input_file = open('f1', 'rb')
contents = input_file.read()
#....
data = contents[offset+8:size+offset] #for example
output_file = open('f2', 'wb')
output_file.write(data)

The result you get is "just the bytes themselves". You can write() them to an open file to copy them.
"It seems like a 0D is being added whenever there is a 0D 0A"
Sounds like you are on windows, and you are opening one of your files in text mode instead of binary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.