Get a binary text from a file in python [duplicate] - python

This question already has answers here:
How to open and read a binary file in Python?
(3 answers)
Closed 1 year ago.
How can I get zeros and ones that make a file? For example, how can I open a file with Python and get the zeros and ones that make it up? And convert those zeros and ones again to a file?
Thanks for your help!

This question is very vague, so I will try and answer some of the possible questions I think you are asking.
How do I open a non-text file in binary mode
Some files need to have the "binary" versions of the files opened. A easier way to think of this would be opening a file in "raw" (binary) vs "plaintext" (text) mode.
Often API's that work with PDF files, and certain other extensions use this to open the files properly since they contain characters that are encoded to be unreadable, and use headers that would garble the files as plain strings.
To do this change the mode of a call to open() to any of these :
rb for reading a binary file. The file pointer is placed at the beginning of the file.
rb+ reading or writing a binary file
wb+ writing a binary file
ab+ Opens a file for both appending and reading in binary. The file pointer is at the end of the file if the file exists. The file opens in the append mode.
For example:
with open("filename.pdf", "rb") as pdf_file:
... # Do things
How do I get the binary values of individual string characters
If you are looking to open a file and get a binary association to a character you can use the ord() function combined with the bin() function:
ord("A") # 65
bin(ord("A")) # '0b1000001'
You can then loop through each character of a file and find a binary representation of each letter this way. This is often used in cryptography, like I did for this project.
If neither of those two solve your issues please clarify what you mean in the original question so I can better address it.

Related

Substitute file content in Python [duplicate]

This question already has answers here:
Confused by python file mode "w+" [duplicate]
(11 answers)
Closed 8 years ago.
I imagine this is a question asked already twenty thousand times, but I cannot understand why the file is always empty. I want to open a file, remove a string from the whole file and then rewrite the content, but the file ends up being empty. This is the code I use:
f = open(filename,'w+')
f.write(f.read().replace(str_to_del,""))
f.close()
But the file is always empty. If I instead use "r+" then the content is appended and I have a duplicate text in the file. I'm using Python 3.3 . What am I missing?
Opening the file in w+ mode truncates the file. So, your f.read() is guaranteed to return nothing.
You can do this by opening the file in r+ mode, reading it, then calling f.seek(0), then writing. Or by opening the file in r mode, reading it, closing it, reopening it in w mode, and writing. Or, better, by writing a temporary file and moving it over the original (which gives you "atomic" behavior—no possibility of ending up with a half-written file).

Python readline() fails when trying to read large (~ 13GB) csv file [duplicate]

This question already has answers here:
Unable to read huge (20GB) file from CPython
(2 answers)
Closed 9 years ago.
I'm a Python newbie and had a quick question regarding memory usage when reading large text files. I have a ~13GB csv I'm trying to read line-by-line following the Python documentation and more experienced Python user's advice to not use readlines() in order to avoid loading the entire file into memory.
When trying to read a line from the file I get the error below and am not sure what might be causing it. Besides this error, I also notice my PC's memory usage is excessively high. This was a little surprising since my understanding of the readline function is that it only loads a single line from the file at a time into memory.
For reference, I'm using Continuum Analytic's Anaconda distribution of Python 2.7 and PyScripter as my IDE for debugging and testing. Any help or insight is appreciated.
with open(R'C:\temp\datasets\a13GBfile.csv','r') as f:
foo = f.readline(); #<-- Err: SystemError: ..\Objects\stringobject.c:3902 bad argument to internal function
UPDATE:
Thank you all for the quick, informative and very helpful feedback, I reviewed the referenced link which is exactly the problem I was having. After applying the documented 'rU' option mode I was able to read lines from the file like normal. I didn't notice this mode mentioned in the documentation link I was referencing initially and neglected to look at the details for the open function first. Thanks again.
Unix text files end each line with \n.
Windows text files end each line with \r\n.
When you open a file in text mode, 'r', Python assumes it has the native line endings for your platform.
So, if you open a Unix text file on Windows, Python will look for \r\n sequences to split the lines. But there won't be any, so it'll treat your whole file is one giant 13-billion-character line. So that readline() call ends up trying to read the whole thing into memory.
The fix for this is to use universal newlines mode, by opening the file in mode rU. As explained in the docs for open:
supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'.
So, instead of searching for \r\n sequences to split the lines, it looks for \r\n, or \n, or \r. And there are millions of \n. So, the problem is solved.
A different way to fix this is to use binary mode, 'rb'. In this mode, Python doesn't do any conversion at all, and assumes all lines end in \n, no matter what platform you're on.
On its own, this is pretty hacky—it means you'll end up with an extra \r on the end of every line in a Windows text file.
But it means you can pass the file on to a higher-level file reader like csv that wants binary files, so it can parse them the way it wants to. On top of magically solving this problem for you, a higher-level library will also probably make the rest of your code a lot simpler and more robust. For example, it might look something like this:
with open(R'C:\temp\datasets\a13GBfile.csv','rb') as f:
for row in csv.reader(f):
# do stuff
Now each row is automatically split on commas, except that commas that are inside quotes or escaped in the appropriate way don't count, and so on, so all you need to deal with is a list of column values.

Outputting string to binary file doesn't work

For some reason, I cannot get a simple string to be output to a binary file with python.
Here is my code:
strin = bytes(strin, '3DFILE')
dataH = struct.pack('s', strin)
outFile.write(dataH)
I'm trying to write a 3D model exporter for a game I am making with blender. can someone please help me out here, or give me an example? I get the error that string is not defined.
Python 3 strings are sequences of unicode characters. The characters are abstract, and they have no binary representation until you say what encoding should be used.
If you have binary data, you can write it to the binary file (opened with binary mode like outFile = open(filename, 'wb') ... outFile.close()) without problem. However, writing binary data to the file opened in text mode cannot be done. It was different in Python 2 where strings were actually sequences of bytes and even the open text file object did not care.

How do I stop having carriage returns added to my file output in Python?

I'm trying to parse a BMP file, do some changes and then reassemble a new BMP using Python.
The carriage return seems to be a huge problem. When I open the bitmap file using Notepad++ and search for "\r', the character does not exist. I read the file in Python (readData = fileIn.read()) and try searching using readData.find('\r') it returns -1. Searching for "\n" works fine. All is good for now.
When I try to write this exact same block of text into a new BMP using fileOut.write(readData) and I use Notepad++ to search for "\r", I am able to find it (twice, each corresponding to the preexisting "\n" characters).
Is there a way to write this block of data to a new BMP without "\r" being added automatically? I've tried applying .strip() and .replace('\r','') to the string before writing it to the new file.
You're probably opening the file as text (the default) when you want to open it as binary.
open("example.bmp", "rb") # to [r]ead as [b]inary
open("example.bmp", "wb") # to [w]rite as [b]inary
From the documentation:
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability.
You are opening the file in text mode, while you need binary mode. Find more about open() here:
http://docs.python.org/library/functions.html

Updating value in binary file with Python

I'm trying to figure out how to update the data in a binary file using Python.
I'm already comfortable reading and writing complete files using "array", but I'm having trouble with in place editing.
Here's what I've tried:
my_file.seek(100)
my_array = array.array('B')
my_array.append(0)
my_array.tofile(my_file)
Essentially, I want to change the value of the byte at position 100. The above code does update the value, but then truncates the rest of the file. I want to be able to change the value at position 100, without modifying anything else in the file.
Note that I'm editing multi-gigabyte files, so I don't want to read the entire thing into memory, update memory, and then write back out to disk.
According to the documentation of open(), you should open the file in 'rb+' mode to avoid the truncating behavior.
Are you opening the file in 'r+b' mode?

Categories

Resources