I need to read a simple but large (500MB) binary file in Python 3.6. The file was created by a C program, and it contains 64-bit double precision data. I tried using struct.unpack but that's very slow for a large file.
Here is my simple file read:
def ReadBinary():
fileName = 'C:\\File_Data\\LargeDataFile.bin'
with open(fileName, mode='rb') as file:
fileContent = file.read()
Now I have fileContent. What is the fastest way to decode it into 64-bit double-precision floating point, or read it without the need to do a format conversion?
I want to avoid, if possible, reading the file in chunks. I would like to read it decoded, all at once, like C does.
You can use array.array('d')'s fromfile method:
def ReadBinary():
fileName = r'C:\File_Data\LargeDataFile.bin'
fileContent = array.array('d')
with open(fileName, mode='rb') as file:
fileContent.fromfile(file)
return fileContent
That's a C-level read as raw machine values. mmap.mmap could also work by creating a memoryview of the mmap object and casting it.
Related
I have the byte-code of a png-file in a string variable. How do I write it to .png file without python trying to encode it? The string is '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\n\x00\x00\x00\x07\x08\x02\x00\x00\x00\xbe\xceK4\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00\x04gAMA\x00\x00\xb1\x8f\x0b\xfca\x05\x00\x00\x00\tpHYs\x00\x00\x0e\xc3\x00\x00\x0e\xc3\x01\xc7o\xa8d\x00\x00\x00DIDAT\x18Wc\xf8\xff\xff\xff\xaf\xfd\x07\xdf[:\xbc\x95Q\x81 \xfb\xc7\xaa\xb5#q \x00I#\xcb\xc1\x11D\x11H\xfa\xdb\x94\x19hr\x10\xf4NY\x1b$\x8d\x0c\x90\x95~\xad\xacE\x97F\x03\x94H\xff\xff\x0f\x00\x1f]\xa2\x03U|Z\xa3\x00\x00\x00\x00IEND\xaeB`\x82'
edit: I feel like you might need more info on my situation: I am trying to make a little encryption program, and although it works on strings, I want to make it work for any file too. I am reading a .png file in byte-mode(which gives the string mentioned above), and after it is done being encrypted and decrypted, I have a string with the exact same content, but no way to put it back into a file.
For python3, you have to open the file in binary write mode and encode the string to bytes:
with open('filename', 'wb') as f:
f.write(the_string.encode())
You could try using PyPNG, looks like a possible solution:
http://pythonhosted.org/pypng/ex.html#writing
This will let you write binary to a file in python.
with open('filename', 'wb') as f:
f.write(bytecode)
What is the easiest way to get the underlying binary code(0s and 1s) for a given file? The context for this question is that I want a python function which takes a file name, looks it up and gathers the binary code for that file before either storing it somewhere or returning it. After this I want to do some manipulations on the binary file.
The underlying code for a file is available form the .read() method of the file object. Use the b mode modifier when you open the file:
with open("input_file.bin", "rb") as input_file:
bits = input_file.read()
If you want to easily manipulate the bits after reading them in, you might want to convert them to a bitarray:
from bitarray import bitarray
with open("input_file.bin", "rb") as input_file:
chars = input_file.read()
bits = bitarray()
bits.frombytes(chars)
print bits.count(1), bits.count(0)
References:
https://docs.python.org/2/library/functions.html#open
https://pypi.python.org/pypi/bitarray/0.8.1
I have a binary file (link) that I would like to open and read contents of with Python. How are such binary files opened and read with Python? Any specific modules to use for such an operation.
The 'b' flag will get python to treat the file as a binary, so no modules are needed. Also you haven't provided a purpose for having python read a binary file with a question like that.
f = open('binaryfile', 'rb')
print(f.read())
Here is an Example:
with open('somefile.bin', 'rb') as f: #the second parameter "rb" is used only when reading binary files. Term "rb" stands for "read binary".
data = f.read() #we are assigning a variable which will read whatever in the file and it will be stored in the variable called data.
print(data)
Reading a file in python is trivial (as mentioned above); however, it turns out that if you want to read a binary file and decode it correctly you need to know how it was encoded in the first place.
I found a helpful example that provided some insight at https://www.devdungeon.com/content/working-binary-data-python,
# Binary to Text
binary_data = b'I am text.'
text = binary_data.decode('utf-8') #Trans form back into human-readable ASCII
print(text)
binary_data = bytes([65, 66, 67]) # ASCII values for A, B, C
text = binary_data.decode('utf-8')
print(text)
but I was still unable to decode some files that my work created because they used an unknown encoding method.
Once you know how it is encoded you can read the file bit by bit and perform the decoding with a function of three.
I have a big binary file. How I can write (prepend) to the begin of the file?
Ex:
file = 'binary_file'
string = 'bytes_string'
I expected get new file with content: bytes_string_binary_file.
Construction open("filename", ab) appends only.
I'm using Python 3.3.1.
There is no way to prepend to a file. You must rewrite the file completely:
with open("oldfile", "rb") as old, open("newfile", "wb") as new:
new.write(string)
new.write(old.read())
If you want to avoid reading the whole file into memory, simply read it by chunks:
with open("oldfile", "rb") as old, open("newfile", "wb") as new:
for chunk in iter(lambda: old.read(1024), b""):
new.write(chunk)
Replace 1024 with a value that works best with your system. (it is the number of bytes read each time).
Is there an easy way to work in binary with Python?
I have a file of data I am receiving (in 1's and 0's) and would like to scan through it and look for certain patterns in binary. It has to be in binary because due to my system, I might be off by 1 bit or so which would throw everything off when converting to hex or ascii.
For example, I would like to open the file, then search for '0001101010111100110' or some string of binary and have it tell me whether or not it exists in the file, where it is, etc.
Is this doable or would I be better off working with another language?
To convert a byte string into a string of '0' and '1', you can use this one-liner:
bin_str = ''.join(bin(0x100 + ord(b))[-8:] for b in byte_str)
Combine that with opening and reading the file:
with open(filename, 'rb') as f:
byte_str = f.read()
Now it's just a simple string search:
if '0001101010111100110' in bin_str:
You would be better working off another language. Python could do it (if you use for example,
file = open("file", "wb")
(appending the b opens it in binary), and then using a simple search, but to be honest, it is much easier and faster to do it in a lower-level language such as C.