Reading and writing binary data of dynamic sizes question - python

I am trying to read data from a file in binary mode and manipulate that data.
try:
resultfile = open("binfile", "rb")
except:
print "Error"
resultsize = os.path.getsize("binfile")
There is a 32 byte header which I parse fine then the buffer of binary data starts. The data can be of any size from 16 to 4092 and can be in any format from text to a pdf or image or anything else. The header has the size of the data so to get this information I do
contents = resultfile.read(resultsize)
and this puts the entire file into a string buffer. I found out this is probably my problem because when I try to copy chunks of the hex data from "contents" into a new file some bytes do not copy correctly so pdfs and images will come out corrupted.
Printing out a bit of the file string buffer in the interpreter yields for example something like "%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n" when I just want the bytes themselves in order to write them to a new file. Is there an easy solution to this problem that I am missing?
Here is an example of a hex dump with the pdf written with my python and the real pdf:
25 50 44 46 2D 31 2E 35 0D 0D 0A 25 B5 B5 B5 B5 0D 0D 0A 31 20 30 20 6F 62 6A 0D 0D 0A
25 50 44 46 2D 31 2E 35 0D 0A 25 B5 B5 B5 B5 0D 0A 31 20 30 20 6F 62 6A
It seems like a 0D is being added whenever there is a 0D 0A. In image files it might be a different byte, I don't remember and might have to test it.
My code to write the new file is pretty simple, using contents as the string buffer holding all the data.
fbuf = contents[offset+8:size+offset]
fl = open(fname, 'a')
fl.write(fbuf)
This is being called in a loop based on a signature that is found in the header. Offset+8 is beginning of actual pdf data and size is the size of the chunk to copy.

You need to open your output file in binary mode, as you do your input file. Otherwise, newline characters may get changed. You can see that this is what happens in your hex dump: 0A characters ('\n') are changed into OD 0A ('\r\n').
This should work:
input_file = open('f1', 'rb')
contents = input_file.read()
#....
data = contents[offset+8:size+offset] #for example
output_file = open('f2', 'wb')
output_file.write(data)

The result you get is "just the bytes themselves". You can write() them to an open file to copy them.
"It seems like a 0D is being added whenever there is a 0D 0A"
Sounds like you are on windows, and you are opening one of your files in text mode instead of binary.

Related

python read binary file byte by byte and read line feed as 0a

This might seem pretty stupid, but I'm a complete newbie in python.
So, I have a binary file that holds data as
47 40 ad e8 66 29 10 87 d7 73 0a 40 10
When I tried to read it with python by
with open(absolutePathInput, "rb") as f:
for line in self.file:
for byte, nextbyte in zip(line[:], line[1:]):
if state == 'wait_for_sync_1':
if ((byte == 0x10) and (nextbyte == 0x87)):
state = 'message_id'
I get
all bytes but 0a is read as line feed(i.e. \n)
It seems that it considers as "line feed" and self.file reads only till 0a.
correct me to read 0a as hex value

read and write binary files in python

I have a binary file called "input.bin".
I am practicing how to work with such files (read them, change the content and write into a new binary file).
the contents of input file:
03 fa 55 12 20 66 67 50 e8 ab
which is in hexadecimal notation.
I want to make a output file which is simply the input file with the value of each byte incremented by one.
here is the expected output:
04 fb 56 13 21 67 68 51 e9 ac
which also will be in hexadecimal notation.
I am trying to do that in python3 using the following command:
with open("input.bin", "rb") as binary_file:
data = binary_file.read()
for item in data:
item2 = item+1
with open("output.bin", "wb") as binary_file2:
binary_file2.write(item2)
but it does not return what I want. do you know how to fix it?
You want to open the output file before the loop, and call write in the loop.
with open("input.bin", "rb") as binary_file:
data = binary_file.read()
with open("output.bin", "wb") as binary_file2:
binary_file2.write(bytes(item - 1 for item in data))

Converting broken byte string from unicode back to corresponding bytes

The following code retrieves an iterable object of strings in rows which contains a PDF byte stream. The string row was type of str. The resulting file was a PDF format and could be opened.
with open(fname, "wb") as fd:
for row in rows:
fd.write(row)
Due to a new C-Library and changes in the Python implementation the str changes to unicode. And the corresponding content changed as well so my PDF file is broken.
Starting bytes of first row object:
old row[0]: 25 50 44 46 2D 31 2E 33 0D 0A 25 E2 E3 CF D3 0D 0A ...
new row[0]: 25 50 44 46 2D 31 2E 33 0D 0A 25 C3 A2 C3 A3 C3 8F C3 93 0D 0A ...
I adjust the corresponding byte positions here so it looks like a unicode problem.
I think this is a good start but I still have a unicode string as input...
>>> "\xc3\xa2".decode('utf8') # but as input I have u"\xc3\xa2"
u'\xe2'
I already tried several calls of encode and decode so I need a more analytical way to fix this. I can't see the wood for the trees. Thank you.
When you find u"\xc3\xa2" in a Python unicode string, it often means that you have read an UTF-8 encoded file as is it was Latin1 encoded. So the best thing to do is certainly to fix the initial read.
That being said if you have to depend on broken code, the fix is still easy: you just encode the string as Latin1 and then decode it as UTF-8:
fixed_u_str = broken_u_str.encode('Latin1').decode('UTF-8')
For example:
u"\xc3\xa2\xc3\xa3".encode('Latin1').decode('utf8')
correctly gives u"\xe2\xe3" which displays as âã
This looks like you should be doing
fd.write(row.encode('utf-8'))
assuming the type of row is now unicode (this is my understanding of how you presented things).

How to read hex values at specific addresses in Python?

Say I have a file and I'm interested in reading and storing hex values at certain addresses, like the snippet below:
22660 00 50 50 04 00 56 0F 50 25 98 8A 19 54 EF 76 00
22670 75 38 D8 B9 90 34 17 75 93 19 93 19 49 71 EF 81
I want to read the value at 0x2266D, and be able to replace it with another hex value, but I can't understand how to do it. I've tried using open('filename', 'rb'), however this reads it as the ASCII representation of the values, and I don't see how to pick and choose when addresses I want to change.
Thanks!
Edit: For an example, I have
rom = open("filename", 'rb')
for i in range(5):
test = rom.next().split()
print test
rom.close()
This outputs: ['NES\x1a', 'B\x00\x00\x00\x00\x00\x00\x00\x00\x00!\x0f\x0f\x0f(\x0f!!', '!\x02\x0f\x0f', '!\x0f\x01\x08', '!:\x0f\x0f\x03!', '\x0f', '\x0f\x0f', '!', '\x0f\x0f!\x0f\x03\x0f\x12', '\x0f\x0f\x0f(\x02&%\x0f', '\x0f', '#', '!\x0f\x0f1', '!"#$\x14\x14\x14\x13\x13\x03\x04\x0f\x0f\x03\x13#!!\x00\x00\x00\x00\x00!!', '(', '\x0f"\x0f', '#\x14\x11\x12\x0f\x0f\x0f#', '\x10', "5'4\x0270&\x02\x02\x02\x02\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f126&\x13\x0f\x0f\x0f\x13&6222\x0f", '\x1c,', etc etc.
Much more than 5 bytes, and while some of it is in hex, some has been replaced with ASCII.
There's no indication that some of the bytes were replaced by their ASCII representations. Some bytes happen to be printable.
With a binary file, you can simply seek to the location offset and write the bytes in. Working with the line-iterator in the case of binary file is problematic, as there's no meaningful "lines" in the binary blob.
You can do in-place editing like follows (in fake Python):
with open("filename", "rb+") as f:
f.seek(0x2266D)
the_byte = f.read(1)
if len(the_byte) != 1:
# something's wrong; bolt out ...
else
transformed_byte = your_function(the_byte)
f.seek(-1, 1) # back one byte relative to the current position
f.write(transformed_byte)
But of course, you may want to do the edit on a copy, either in-memory (and commit later, as in the answer of #JosepValls), or on a file copy. The problem with gulping the whole file in memory is, of course, sometimes the system may choke ;) For that purpose you may want to mmap part of the file.
Given that is not a very big file (roms should fit fine in today's computer's memory), just do data = open('filename', 'rb').read(). Now you can do whatever you want to the data (if you print it, it will show ascii, but that is just data!). Unfortunately, string objects don't support item assignment, see this answer for more:
Change one character in a string in Python?
In your case:
data = data[0:0x2266C] + str(0xFF) + data[0x2266D:]

How do I translate a hash function from Python to R

I'm trying to translate the following python script to R, but I'm having difficulty that reflects that fact that I am not well versed with Python or R.
Here is what I have for Python:
import hashlib, hmac
print hmac.new('123456', 'hello'.encode('utf-8'),hashlib.sha256).digest()
When I run this in Python I'm getting a message that says standard output is empty.
Question: What am I doing wrong?
Here's what I'm using for R
library('digest')
hmac('123456','hello', algo='sha256', serialize=FALSE)
My questions with the R code are:
How do I encode to utf-8 in R. I couldn't find a package.
What are the correct parameter settings for serialize and raw for R given I want to match the output of the Python function above (once its working).
If you want to get the bytes of the hash in R, set raw=TRUE. Then you can write it out as a binary fine
library('digest')
x <- hmac('123456', enc2utf8('hello'), algo='sha256', serialize=FALSE, raw=TRUE)
writeBin(x, "Rout.txt")
If you're not outputting text, the encoding doesn't matter. These are raw bytes. The only different in the output is that the python print seems to be adding a new line character. If I hexdump on the R file i see
0000000 ac 28 d6 02 c7 67 42 4d 0c 80 9e de bf 73 82 8b
0000010 ed 5c e9 9c e1 55 6f 4d f8 e2 23 fa ee c6 0e dd

Categories

Resources