How to open and read a binary file in Python? - python

I have a binary file (link) that I would like to open and read contents of with Python. How are such binary files opened and read with Python? Any specific modules to use for such an operation.

The 'b' flag will get python to treat the file as a binary, so no modules are needed. Also you haven't provided a purpose for having python read a binary file with a question like that.
f = open('binaryfile', 'rb')
print(f.read())

Here is an Example:
with open('somefile.bin', 'rb') as f: #the second parameter "rb" is used only when reading binary files. Term "rb" stands for "read binary".
data = f.read() #we are assigning a variable which will read whatever in the file and it will be stored in the variable called data.
print(data)

Reading a file in python is trivial (as mentioned above); however, it turns out that if you want to read a binary file and decode it correctly you need to know how it was encoded in the first place.
I found a helpful example that provided some insight at https://www.devdungeon.com/content/working-binary-data-python,
# Binary to Text
binary_data = b'I am text.'
text = binary_data.decode('utf-8') #Trans form back into human-readable ASCII
print(text)
binary_data = bytes([65, 66, 67]) # ASCII values for A, B, C
text = binary_data.decode('utf-8')
print(text)
but I was still unable to decode some files that my work created because they used an unknown encoding method.
Once you know how it is encoded you can read the file bit by bit and perform the decoding with a function of three.

Related

Edit Minecraft .dat File in Python

I'm looking to edit a Minecraft Windows 10 level.dat file in python. I've tried using the package nbt and pyanvil but get the error OSError: Not a gzipped file. If I print open("level.dat", "rb").read() I get a lot of nonsensical data. It seems like it needs to be decoded somehow, but I don't know what decoding it needs. How can I open (and ideally edit) one of these files?
To read data just do :
from nbt import nbt
nbtfile = nbt.NBTFile("level.dat", 'rb')
print(nbtfile) # Here you should get a TAG_Compound('Data')
print(nbtfile["Data"].tag_info()) # Data came from the line above
for tag in nbtfile["Data"].tags: # This loop will show us each entry
print(tag.tag_info())
As for editing :
# Writing data (changing the difficulty value
nbtfile["Data"]["Difficulty"].value = 2
print(nbtfile["Data"]["Difficulty"].tag_info())
nbtfile.write_file("level.dat")
EDIT:
It looks like Mojang doesn't use the same formatting for Java and bedrock, as bedrock's level.dat file is stored in little endian format and uses non-compressed UTF-8.
As an alternative, Amulet-Nbt is supposed to be a Python library written in Cython for reading and editing NBT files (supposedly works with Bedrock too).
Nbtlib also seems to work, as long as you set byteorder="little when loading the file.
Let me know if u need more help...
You'll have to give the path either relative to the current working directory
path/to/file.dat
Or you can use the absolute path to the file
C:user/dir/path/to/file.dat
Read the data,replace the values and then write it
# Read in the file
with open('file.dat', 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('yuor replacement or edit')
# Write the file out again
with open('file.dat', 'w') as file:
file.write(filedata)

Python binary file write directly from string

I have the byte-code of a png-file in a string variable. How do I write it to .png file without python trying to encode it? The string is '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\n\x00\x00\x00\x07\x08\x02\x00\x00\x00\xbe\xceK4\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00\x04gAMA\x00\x00\xb1\x8f\x0b\xfca\x05\x00\x00\x00\tpHYs\x00\x00\x0e\xc3\x00\x00\x0e\xc3\x01\xc7o\xa8d\x00\x00\x00DIDAT\x18Wc\xf8\xff\xff\xff\xaf\xfd\x07\xdf[:\xbc\x95Q\x81 \xfb\xc7\xaa\xb5#q \x00I#\xcb\xc1\x11D\x11H\xfa\xdb\x94\x19hr\x10\xf4NY\x1b$\x8d\x0c\x90\x95~\xad\xacE\x97F\x03\x94H\xff\xff\x0f\x00\x1f]\xa2\x03U|Z\xa3\x00\x00\x00\x00IEND\xaeB`\x82'
edit: I feel like you might need more info on my situation: I am trying to make a little encryption program, and although it works on strings, I want to make it work for any file too. I am reading a .png file in byte-mode(which gives the string mentioned above), and after it is done being encrypted and decrypted, I have a string with the exact same content, but no way to put it back into a file.
For python3, you have to open the file in binary write mode and encode the string to bytes:
with open('filename', 'wb') as f:
f.write(the_string.encode())
You could try using PyPNG, looks like a possible solution:
http://pythonhosted.org/pypng/ex.html#writing
This will let you write binary to a file in python.
with open('filename', 'wb') as f:
f.write(bytecode)

Easy way to view and save the binary of a file?

What is the easiest way to get the underlying binary code(0s and 1s) for a given file? The context for this question is that I want a python function which takes a file name, looks it up and gathers the binary code for that file before either storing it somewhere or returning it. After this I want to do some manipulations on the binary file.
The underlying code for a file is available form the .read() method of the file object. Use the b mode modifier when you open the file:
with open("input_file.bin", "rb") as input_file:
bits = input_file.read()
If you want to easily manipulate the bits after reading them in, you might want to convert them to a bitarray:
from bitarray import bitarray
with open("input_file.bin", "rb") as input_file:
chars = input_file.read()
bits = bitarray()
bits.frombytes(chars)
print bits.count(1), bits.count(0)
References:
https://docs.python.org/2/library/functions.html#open
https://pypi.python.org/pypi/bitarray/0.8.1

Outputting string to binary file doesn't work

For some reason, I cannot get a simple string to be output to a binary file with python.
Here is my code:
strin = bytes(strin, '3DFILE')
dataH = struct.pack('s', strin)
outFile.write(dataH)
I'm trying to write a 3D model exporter for a game I am making with blender. can someone please help me out here, or give me an example? I get the error that string is not defined.
Python 3 strings are sequences of unicode characters. The characters are abstract, and they have no binary representation until you say what encoding should be used.
If you have binary data, you can write it to the binary file (opened with binary mode like outFile = open(filename, 'wb') ... outFile.close()) without problem. However, writing binary data to the file opened in text mode cannot be done. It was different in Python 2 where strings were actually sequences of bytes and even the open text file object did not care.

python opens text file with a space between every character

Whenever I try to open a .csv file with the python command
fread = open('input.csv', 'r')
it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?
Thanks.
Update
Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)
Thanks!
The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.
Try something like:
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.
EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.
The code snippet above seems to work on my machine with that file.
The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.
Isn't csv a simple txt file with values separated with comma.
Just try to open it with a text editor to see if the file is correctly formed.
To read an encoded file, you can simply replace open with codecs.open.
fread = codecs.open('input.csv', 'r', 'utf-16')
It did never ocurred to me, but as truppo said, it must be something wrong with the file.
Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again.
If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file.
Ok, I got it with the help of Jarret Hardie's post
this is the code that I used to convert the file to ascii
fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)
Thanks!
Open the file in binary mode, 'rb'. Check it in a HEX Editor and check for null padding '00'. Open the file in something like Scintilla Text Editor to check the characters present in the file.
Here's the quick and easy way, esp if python won't parse the input correctly
sed 's/ \(.\)/\1/g'

Categories

Resources