EOF in a binary file using python - python

I've made a code to read a binary file as follows :
file=open('myfile.chn','rb')
i=0
for x in file:
i=i+1
print(x)
file.close()
and the result as follows (a part of it) : b'\x00\x00\x80?\x00\x00\x00\x005.xx\x00S\xd4\n'
How can i detect the EOF of this binary file? Let say i want to print() after i find the EOF. I tried this, but nothing happened.
if (x=='\n'):
print()
(updated)
#aix: let say that the file have few lines of results,just like the example, each line has '\n' at the end and i want put a space between each line.
b'\x00\x00\x80?\x00\x00\x00\x005.xx\x00S\xd4\n'
b'\x82\x93p\x05\xf6\x8c4S\x00\x00\xaf\x07j\n'
How can i do this?

Once your reach the EOF, the for x in file: loop will terminate.
with open('myfile.chn', 'rb') as f:
i = 0
for x in f:
i += 1
print(x)
print('reached the EOF')
I've renamed the file variable so that it doesn't shadow the built-in.

NPE's answer is correct, but I feel some additional clarification is necessary.
You tried to detect EOF using something like
if (x=='\n'):
...
so probably you are confused the same way I was confused until today.
EOF is NOT a character or byte. It's NOT a value that exists in the end of a file and it's not something which could exist in the middle of some (even binary) file. In C world EOF has some value, but even there it's value is different from the value of any char (and even it's type is not 'char'). But in python world EOF means "end of file reached". Python help to 'read' function says "... reads and returns all data until EOF" and that does not mean "until EOF byte is found". It means "until file is over".
Deeper explanation of what is and what is not a 'EOF' is here:
http://faq.cprogramming.com/cgi-bin/smartfaq.cgi?answer=1048865140&id=1043284351

Related

Appending a text file to a text file

I've run to a error. I've been trying to append a text file to itself like so:
file_obj = open("text.txt", "a+")
number = 6
def appender(obj, num):
count = 0
while count<=num:
read = file_obj.read()
file_obj.seek(0,2)
file_obj.write(read)
count+=1
appender(file_obj, number)
However, the text.txt file is then filled with strange ASCII symbols. At first, the file contains only a simple "hello", but after the code, it contains this:
hellohello䀀 猀· d娀 Ť搀Ŭ娀ͤ攀ɪ昀Ѥ萀 夀ɚ搀ť樀Ŧ搀茀 婙ݤ攀Ѫ昀ࡤ萀 夀њ搀
ɥ攀ժ昀൤
茀 婙୤攀ť樀ɦ搀茀 婙൤萀 ݚ搀࡚攀४攀ƃ娀਍搀⡓ 癳  祐桴湯䌠慨慲瑣牥䴠灡楰杮
䌠摯捥挠ㅰ㔲‰敧敮慲整⁤牦浯✠䅍偐义升嘯久佄卒䴯䍉䙓⽔䥗䑎坏⽓偃㈱〵吮员‧楷桴朠湥潣敤⹣祰
മഊ椊 and so on.
Any help will be appreciated
I think I can fix your problem, even though I can't reproduce it. There's a logic error: after you write, you fail to return to the start of the file for reading. In terms of analysis, you failed to do anything to diagnose the problem. At the very least, use a print statement to see what you're reading: that highlights the problem quite well. Here's the loop I used:
count = 0
while count<=num:
file_obj.seek(0) # Read from the beginning of the file.
read = file_obj.read()
print(count, read) # Trace what we're reading.
file_obj.seek(0, 2)
file_obj.write(read)
count+=1
This gives the expected output of 128 (2^(6+1)) repetitions of "hello".
EXTENSIONS
I recommend that you learn to use both the for loop and the with open ... as idiom. These will greatly shorten your program and improve the readability.
I am using this code and everything is working as expected:
with open("file.txt") as f:
for line in f:
f.write(line)
You just have the wrong mode - use 'r+' rather than 'a+'. See this link for a list of modes and an explanation of reading files.

How to read user input until EOF in python?

I came across this problem in UVa OJ. 272-Text Quotes
Well, the problem is quite trivial. But the thing is I am not able to read the input. The input is provided in the form of text lines and end of input is indicated by EOF.
In C/C++ this can be done by running a while loop:
while( scanf("%s",&s)!=EOF ) { //do something }
How can this be done in python .?
I have searched the web but I did not find any satisfactory answer.
Note that the input must be read from the console and not from a file.
You can use sys module:
import sys
complete_input = sys.stdin.read()
sys.stdin is a file like object that you can treat like a Python File object.
From the documentation:
Help on built-in function read:
read(size=-1, /) method of _io.TextIOWrapper instance
Read at most n characters from stream.
Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
You can read input from console till the end of file using sys and os module in python. I have used these methods in online judges like SPOJ several times.
First method (recommened):
from sys import stdin
for line in stdin:
if line == '': # If empty string is read then stop the loop
break
process(line) # perform some operation(s) on given string
Note that there will be an end-line character \n at the end of every line you read. If you want to avoid printing 2 end-line characters while printing line use print(line, end='').
Second method:
import os
# here 0 and 10**6 represents starting point and end point in bytes.
lines = os.read(0, 10**6).strip().splitlines()
for x in lines:
line = x.decode('utf-8') # convert bytes-like object to string
print(line)
This method does not work on all online judges but it is the fastest way to read input from a file or console.
Third method:
while True:
line = input()
if line == '':
break
process(line)
replace input() with raw_input() if you're still using python 2.
For HackerRank and HackerEarth platform below implementation is preferred:
while True:
try :
line = input()
...
except EOFError:
break;
This how you can do it :
while True:
try :
line = input()
...
except EOFError:
pass
If you need to read one character on the keyboard at a time, you can see an implementation of getch in Python: Python read a single character from the user

confused about the readline() return in Python

I am a beginner in python. However, I have some problems when I try to use the readline() method.
f=raw_input("filename> ")
a=open(f)
print a.read()
print a.readline()
print a.readline()
print a.readline()
and my txt file is
aaaaaaaaa
bbbbbbbbb
ccccccccc
However, when I tried to run it on a Mac terminal, I got this:
aaaaaaaaa
bbbbbbbbb
ccccccccc
It seems that readline() is not working at all.
But when I disable print a.read(), the readline() gets back to work.
This confuses me a lot. Is there any solution where I can use read() and readline() at the same time?
When you open a file you get a pointer to some place of the file (by default: the begining). Now whenever you run .read() or .readline() this pointer moves:
.read() reads until the end of the file and moves the pointer to the end (thus further calls to any reading gives nothing)
.readline() reads until newline is seen and sets the pointer after it
.read(X) reads X bytes and sets the pointer at CURRENT_LOCATION + X (or the end)
If you wish you can manually move that pointer by issuing a.seek(X) call where X is a place in file (seen as an array of bytes). For example this should give you the desired output:
print a.read()
a.seek(0)
print a.readline()
print a.readline()
print a.readline()
You need to understand the concept of file pointers. When you read the file, it is fully consumed, and the pointer is at the end of the file.
It seems that the readline() is not working at all.
It is working as expected. There are no lines to read.
when I disable print a.read(), the readline() gets back to work.
Because the pointer is at the beginning of the file, and the lines can be read
Is there any solution that I can use read() and readline() at the same time?
Sure. Flip the ordering of reading a few lines, then the remainder of the file, or seek the file pointer back to a position that you would like.
Also, don't forget to close the file when you are finished reading it
The file object a remembers it's position in the file.
a.read() reads from the current position to end of the file (moving the position to the end of the file)
a.readline() reads from the current position to the end of the line (moving the position to the next line)
a.seek(n) moves to position n in the file (without returning anything)
a.tell() returns the position in the file.
So try putting the calls to readline first. You'll notice that now the read call won't return the whole file, just the remaining lines (maybe none), depending on how many times you called readline. And play around with seek and tell to confirm whats going on.
Details here.

Python conditional statement based on text file string

Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.
The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).
Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.
The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.
Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not
Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.

File contents not as long as expected

with open(sourceFileName, 'rt') as sourceFile:
sourceFileConents = sourceFile.read()
sourceFileConentsLength = len(sourceFileConents)
i = 0
while i < sourceFileConentsLength:
print(str(i) + ' ' + sourceFileConents[i])
i += 1
Please forgive the unPythonic for i loop, this is only the test code & there are reasons to do it that way in the real code.
Anyhoo, the real code seemed to be ending the loop sooner than expected, so I knocked up the dummy above, which removes all of the logic of the real code.
The sourceFileConentsLength reports as 13,690, but when I print it out char for char, there are still a few 100 chars more in the file, which are not being printed out.
What gives?
Should I be using something other than <fileHandle>.read() to get the file's entire contents into a single string?
Have I hit some maximum string length? If so, can I get around it?
Might it be line endings if the file was edited in Windows & the script is run in Linux (sorry, I can't post the file, it's company confidential)
What else?
[Update] I think that we strike two of those ideas.
For maximum string length, see this question.
I did an ls -lAF to a temp directory. Only 6k+ chars, but the script handed it just fine. Should I be worrying about line endings? If so, what can I do about it? The source files tend to get edited under both Windows & Linux, but the script will only run under Linux.
[Updfate++] I changed the line endings on my input file to Linux in Eclipse, but still got the same result.
If you read a file in text mode it will automatically convert line endings like \r\n to \n.
Try using
with open(sourceFileName, newline='') as sourceFile:
instead; this will turn off newline-translation (\r\n will be returned as \r\n).
If your file is encoded in something like UTF-8, you should decode it before counting the characters:
sourceFileContents_utf8 = open(sourceFileName, 'r+').read()
sourceFileContents_unicode = sourceFileContents_utf8.decode('utf8')
print(len(sourceFileContents_unicode))
i = 0
source_file_contents_length = len(sourceFileContents_unicode)
while i < source_file_contents_length:
print('%s %s' % (str(i), sourceFileContents[i]))
i += 1

Categories

Resources