I'm trying to read a file in python as binary.
Im interested in four bytes at a time, however I seem to be stuck in the infamous while loop:
with open(filename, "rb") as file:
while file:
file.read(4)
print "EOF"
I've been trying this for the past hour, I never reach the end of the file, even in tiny text files. I did a "print test = file.read(4)" only to see that it prints ""
How can I make sure it stops? My first idea was to make a if statement saying if file.read(4) (in a variable) == ""{4} or something, but this might actually appear in a file, right? so it could potentially stop in the middle of it.
Is the only other option to beforehand calculate the size of the file?
At the end of the file, file.read(..) will return an bytes (or string depending on your python version):
Check the return value of the file.read; break if it's empty:
with open(filename, "rb") as file:
while True: # --> replaced `file` with `True` to be clear
data = file.read(4)
if not data: # empty => EOF
# OR if len(data) < 4: if you don't want last incomplete chunk
break
# process data
file is an _io.BufferReader object, not None, so never be treated as False.
You should check if the return value of file.read(4) is an empty string(treated as False).
Related
Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.
The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).
Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.
The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.
Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not
Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.
def is_number(file):
cList = file.read()
chars=len(cList)
t = 0
retlist=[]
while t<chars:
try:
int(cList[t])
int(cList[t + 1])
x = (cList[t] + cList[t + 1])
retlist.append(int(x))
t+=1
except ValueError:
try:
x = int(cList[t])
retlist.append(x)
except ValueError:
pass
t+=1
retlist.sort()
return retlist
Ok so this is my code that reads a file and takes all the numbers up to 99 and adds them to a list. But when i return it the list it's suddenly empty for some reason, can't figure out why please help!
def main():
while True:
try:
f = input("Enter the name of the file: ")
file = open(f + ".txt", "r")
is_number(file)
break
except IOError:
pass
print("The file %s could not be found, try again!" % (f))
numList = is_number(file)
print(numList)
main()
The code that calls the function.
Python (and most other languages) have the notion of a "file pointer" -- it's a reference to some location in the file. All reading and writing starts at the file pointer. For example, if the file pointer is at the beginning of the file, calling read() will read the entire file. If the file pointer were moved, say, 100 characters forward, calling read() would skip those first 100 characters.
Reading will always advance the file pointer to immediately after the point it stopped reading. So, for example, if you asked it to read only 100 bytes, the file pointer will advance 100 bytes and the next read would read from there.
In your code, is_number accepts a file handle and immediately reads the entire contents of the file. When it does this, the file pointer is moved to the end of the file. After your loop exits, you call is_number again on the last file that was opened. Since the file pointer is at the end of the file and hasn't been moved, there's nothing to read so numList is set to the empty string.
Just to add to Bryan's answer, you can use the seek() method of files to restart reading from the beginning. For example, if f is the name of your file handle, f.seek(0) will point to the beginning of the file.
I wrote a program that uses bitarray 0.8.0 to write bits to a binary file. I would like to add a header to this binary file to describe what's inside the file.
My problem is that I think the method "fromfile" of bitarray necessarily starts reading the file from the beginning. I could make a workaround so that the reading program gets the header and then rewrite a temporary file containing only the binary portion (bitarray tofile), but it doesn't sound too efficient of an idea.
Is there any way to do this properly?
My file could look something like the following where clear text is the header and binary data is the bitarray information:
...{(0, 0): '0'}{(0, 0): '0'}{(0, 0): '0'}���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������...
Edit:
I tried the following after reading the response:
bits = ""
b = bitarray()
with open(Filename, 'rb') as file:
#Get header
byte = file.read(1)
while byte != "":
# read header
byte = file.read(1)
b.fromfile(file)
print b.to01()
print "len(b.to01())", len(b.to01())
The length is 0 and the print of "to01()" is empty.
However, the print of the header is fine.
My problem is that I think the method "fromfile" of bitarray necessarily starts reading the file from the beginning.
This is likely false; it, like most other file read routines, probably starts at the current position within the file, and stops at EOF.
EDIT:
From the documentation:
fromfile(f, [n])
Read n bytes from the file object f and append them to the bitarray interpreted as machine values. When n is omitted, as many bytes are read until EOF is reached.
I've made a code to read a binary file as follows :
file=open('myfile.chn','rb')
i=0
for x in file:
i=i+1
print(x)
file.close()
and the result as follows (a part of it) : b'\x00\x00\x80?\x00\x00\x00\x005.xx\x00S\xd4\n'
How can i detect the EOF of this binary file? Let say i want to print() after i find the EOF. I tried this, but nothing happened.
if (x=='\n'):
print()
(updated)
#aix: let say that the file have few lines of results,just like the example, each line has '\n' at the end and i want put a space between each line.
b'\x00\x00\x80?\x00\x00\x00\x005.xx\x00S\xd4\n'
b'\x82\x93p\x05\xf6\x8c4S\x00\x00\xaf\x07j\n'
How can i do this?
Once your reach the EOF, the for x in file: loop will terminate.
with open('myfile.chn', 'rb') as f:
i = 0
for x in f:
i += 1
print(x)
print('reached the EOF')
I've renamed the file variable so that it doesn't shadow the built-in.
NPE's answer is correct, but I feel some additional clarification is necessary.
You tried to detect EOF using something like
if (x=='\n'):
...
so probably you are confused the same way I was confused until today.
EOF is NOT a character or byte. It's NOT a value that exists in the end of a file and it's not something which could exist in the middle of some (even binary) file. In C world EOF has some value, but even there it's value is different from the value of any char (and even it's type is not 'char'). But in python world EOF means "end of file reached". Python help to 'read' function says "... reads and returns all data until EOF" and that does not mean "until EOF byte is found". It means "until file is over".
Deeper explanation of what is and what is not a 'EOF' is here:
http://faq.cprogramming.com/cgi-bin/smartfaq.cgi?answer=1048865140&id=1043284351
So I have a program which runs. This is part of the code:
FileName = 'Numberdata.dat'
NumberFile = open(FileName, 'r')
for Line in NumberFile:
if Line == '4':
print('1')
else:
print('9')
NumberFile.close()
A pretty pointless thing to do, yes, but I'm just doing it to enhance my understanding. However, this code doesn't work. The file remains as it is and the 4's are not replaced by 1's and everything else isn't replaced by 9's, they merely stay the same. Where am I going wrong?
Numberdata.dat is "444666444666444888111000444"
It is now:
FileName = 'Binarydata.dat'
BinaryFile = open(FileName, 'w')
for character in BinaryFile:
if charcter == '0':
NumberFile.write('')
else:
NumberFile.write('#')
BinaryFile.close()
You need to build up a string and write it to the file.
FileName = 'Numberdata.dat'
NumberFileHandle = open(FileName, 'r')
newFileString = ""
for Line in NumberFileHandle:
for char in line: # this will work for any number of lines.
if char == '4':
newFileString += "1"
elif char == '\n':
newFileString += char
else:
newFileString += "9"
NumberFileHandle.close()
NumberFileHandle = open(FileName, 'w')
NumberFileHandle.write(newFileString)
NumberFileHandle.close()
First, Line will never equal 4 because each line read from the file includes the newline character at the end. Try if Line.strip() == '4'. This will remove all white space from the beginning and end of the line.
Edit: I just saw your edit... naturally, if you have all your numbers on one line, the line will never equal 4. You probably want to read the file a character at a time, not a line at a time.
Second, you're not writing to any file, so naturally the file won't be getting changed. You will run into difficulty changing a file as you read it (since you have to figure out how to back up to the same place you just read from), so the usual practice is to read from one file and write to a different one.
Because you need to write to the file as well.
with open(FileName, 'w') as f:
f.write(...)
Right now you are just reading and manipulating the data, but you're not writing them back.
At the end you'll need to reopen your file in write mode and write to it.
If you're looking for references, take a look at theopen() documentation and at the Reading and Writing Files section of the Python Tutorial.
Edit: You shouldn't read and write at the same time from the same file. You could either, write to a temp file and at the end call shutil.move(), or load and manipulate your data and then re-open your original file in write mode and write them back.
You are not sending any output to the data, you are simply printing 1 and 9 to stdout which is usually the terminal or interpreter.
If you want to write to the file you have to use open again with w.
eg.
out = open(FileName, 'w')
you can also use
print >>out, '1'
Then you can call out.write('1') for example.
Also it is a better idea to read the file first if you want to overwrite and write after.
According to your comment:
Numberdata is just a load of numbers all one line. Maybe that's where I'm going wrong? It is "444666444666444888111000444"
I can tell you that the for cycle, iterate over lines and not over chars. There is a logic error.
Moreover, you have to write the file, as Rik Poggi said (just rember to open it in write mode)
A few things:
The r flag to open indicates read-only mode. This obviously won't let you write to the file.
print() outputs things to the screen. What you really want to do is output to the file. Have you read the Python File I/O tutorial?
for line in file_handle: loops through files one line at a time. Thus, if line == '4' will only be true if the line consists of a single character, 4, all on its own.
If you want to loop over characters in a string, then do something like for character in line:.
Modifying bits of a file "in place" is a bit harder than you think.
This is because if you insert data into the middle of a file, the rest of the data has to shuffle over to make room - this is really slow because everything after your insertion has to be rewritten.
In theory, a one-byte for one-byte replacement can be done fast, but in general people don't want to replace byte-for-byte, so this is an advanced feature. (See seek().) The usual approach is to just write out a whole new file.
Because print doesn't write to your file.
You have to open the file and read it, modify the string you obtain creating a new string, open again the file and write it again.
FileName = 'Numberdata.dat'
NumberFile = open(FileName, 'r')
data = NumberFile.read()
NumberFile.close()
dl = data.split('\n')
for i in range(len(dl)):
if dl[i] =='4':
dl[i] = '1'
else:
dl[i] = '9'
NumberFile = open(FileName, 'w')
NumberFile.write('\n'.join(dl))
NumberFile.close()
Try in this way. There are for sure different methods but this seems to be the most "linear" to me =)