Python binary EOF - python

I want to read through a binary file.
Googling "python binary eof" led me here.
Now, the questions:
Why does the container (x in the SO answer) contain not a single (current) byte but a whole bunch of them? What am I doing wrong?
If it should be so and I am doing nothing wrong, HOW do read a single byte? I mean, is there any way to detect EOF while reading the file with read(1) method?

To quote the documentation:
file.read([size])
Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread() more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.
That means (for a regular file):
f.read(1) will return a byte object containing either 1 byte or 0 byte is EOF was reached
f.read(2) will return a byte object containing either 2 bytes, or 1 byte if EOF is reached after the first byte, or 0 byte if EOF in encountered immediately.
...
If you want to read your file one byte at a time, you will have to read(1) in a loop and test for "emptiness" of the result:
# From answer by #Daniel
with open(filename, 'rb') as f:
while True:
b = f.read(1)
if not b:
# eof
break
do_something(b)
If you want to read your file by "chunk" of say 50 bytes at a time, you will have to read(50) in a loop:
with open(filename, 'rb') as f:
while True:
b = f.read(50)
if not b:
# eof
break
do_something(b) # <- be prepared to handle a last chunk of length < 50
# if the file length *is not* a multiple of 50
In fact, you may even break one iteration sooner:
with open(filename, 'rb') as f:
while True:
b = f.read(50)
do_something(b) # <- be prepared to handle a last chunk of size 0
# if the file length *is* a multiple of 50
# (incl. 0 byte-length file!)
# and be prepared to handle a last chunk of length < 50
# if the file length *is not* a multiple of 50
if len(b) < 50:
break
Concerning the other part of your question:
Why does the container [..] contain [..] a whole bunch of them [bytes]?
Referring to that code:
for x in file:
i=i+1
print(x)
To quote again the doc:
A file object is its own iterator, [..]. When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing).
The the code above read a binary file line-by-line. That is stopping at each occurrence of the EOL char (\n). Usually, that leads to chunks of various length as most binary files contains occurrences of that char randomly distributed.
I wouldn't encourage you to read a binary file that way. Please prefer one a solution based on read(size).

"" will signify the end of the file
with open(filename, 'rb') as f:
for ch in iter(lambda: f.read(1),""): # keep calling f.read(1) until end of the data
print ch

Reading byte-by-byte:
with open(filename, 'rb') as f:
while True:
b = f.read(1)
if not b:
# eof
break
do_something(b)

Here is what I did. The call to read returns a falsy value when it encounters the end of the file, and this terminates the loop. Using while ch != "": copied the image but it gave me a hung loop.
from sys import argv
donor = argv[1]
recipient = argv[2]
# read from donor and write into recipient
# with statement ends, file gets closed
with open(donor, "rb") as fp_in:
with open(recipient, "wb") as fp_out:
ch = fp_in.read(1)
while ch:
fp_out.write(ch)
ch = fp_in.read(1)

Related

How to write eof in Python? [duplicate]

fp = open("a.txt")
#do many things with fp
c = fp.read()
if c is None:
print 'fp is at the eof'
Besides the above method, any other way to find out whether is fp is already at the eof?
fp.read() reads up to the end of the file, so after it's successfully finished you know the file is at EOF; there's no need to check. If it cannot reach EOF it will raise an exception.
When reading a file in chunks rather than with read(), you know you've hit EOF when read returns less than the number of bytes you requested. In that case, the following read call will return the empty string (not None). The following loop reads a file in chunks; it will call read at most once too many.
assert n > 0
while True:
chunk = fp.read(n)
if chunk == '':
break
process(chunk)
Or, shorter:
for chunk in iter(lambda: fp.read(n), ''):
process(chunk)
The "for-else" design is often overlooked. See: Python Docs "Control Flow in Loop":
Example
with open('foobar.file', 'rb') as f:
for line in f:
foo()
else:
# No more lines to be read from file
bar()
I'd argue that reading from the file is the most reliable way to establish whether it contains more data. It could be a pipe, or another process might be appending data to the file etc.
If you know that's not an issue, you could use something like:
f.tell() == os.fstat(f.fileno()).st_size
As python returns empty string on EOF, and not "EOF" itself, you can just check the code for it, written here
f1 = open("sample.txt")
while True:
line = f1.readline()
print line
if ("" == line):
print "file finished"
break;
When doing binary I/O the following method is useful:
while f.read(1):
f.seek(-1,1)
# whatever
The advantage is that sometimes you are processing a binary stream and do not know in advance how much you will need to read.
You can compare the returned value of fp.tell() before and after calling the read method. If they return the same value, fp is at eof.
Furthermore, I don't think your example code actually works. The read method to my knowledge never returns None, but it does return an empty string on eof.
Here is a way to do this with the Walrus Operator (new in Python 3.8)
f = open("a.txt", "r")
while (c := f.read(n)):
process(c)
f.close()
Useful Python Docs (3.8):
Walrus operator: https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions
Methods of file objects: https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
read returns an empty string when EOF is encountered. Docs are here.
f=open(file_name)
for line in f:
print line
I really don't understand why python still doesn't have such a function. I also don't agree to use the following
f.tell() == os.fstat(f.fileno()).st_size
The main reason is f.tell() doesn't likely to work for some special conditions.
The method works for me is like the following. If you have some pseudocode like the following
while not EOF(f):
line = f.readline()
" do something with line"
You can replace it with:
lines = iter(f.readlines())
while True:
try:
line = next(lines)
" do something with line"
except StopIteration:
break
This method is simple and you don't need to change most of you code.
If file is opened in non-block mode, returning less bytes than expected does not mean it's at eof, I'd say #NPE's answer is the most reliable way:
f.tell() == os.fstat(f.fileno()).st_size
The Python read functions will return an empty string if they reach EOF
f = open(filename,'r')
f.seek(-1,2) # go to the file end.
eof = f.tell() # get the end of file location
f.seek(0,0) # go back to file beginning
while(f.tell() != eof):
<body>
You can use the file methods seek() and tell() to determine the position of the end of file. Once the position is found, seek back to the file beginning
Python doesn't have built-in eof detection function but that functionality is available in two ways: f.read(1) will return b'' if there are no more bytes to read. This works for text as well as binary files. The second way is to use f.tell() to see if current seek position is at the end. If you want EOF testing not to change the current file position then you need bit of extra code.
Below are both implementations.
Using tell() method
import os
def is_eof(f):
cur = f.tell() # save current position
f.seek(0, os.SEEK_END)
end = f.tell() # find the size of file
f.seek(cur, os.SEEK_SET)
return cur == end
Using read() method
def is_eof(f):
s = f.read(1)
if s != b'': # restore position
f.seek(-1, os.SEEK_CUR)
return s == b''
How to use this
while not is_eof(my_file):
val = my_file.read(10)
Play with this code.
You can use tell() method after reaching EOF by calling readlines()
method, like this:
fp=open('file_name','r')
lines=fp.readlines()
eof=fp.tell() # here we store the pointer
# indicating the end of the file in eof
fp.seek(0) # we bring the cursor at the begining of the file
if eof != fp.tell(): # we check if the cursor
do_something() # reaches the end of the file
Reading a file in batches of BATCH_SIZE lines (the last batch can be shorter):
BATCH_SIZE = 1000 # lines
with open('/path/to/a/file') as fin:
eof = False
while eof is False:
# We use an iterator to check later if it was fully realized. This
# is a way to know if we reached the EOF.
# NOTE: file.tell() can't be used with iterators.
batch_range = iter(range(BATCH_SIZE))
acc = [line for (_, line) in zip(batch_range, fin)]
# DO SOMETHING WITH "acc"
# If we still have something to iterate, we have read the whole
# file.
if any(batch_range):
eof = True
Get the EOF position of the file:
def get_eof_position(file_handle):
original_position = file_handle.tell()
eof_position = file_handle.seek(0, 2)
file_handle.seek(original_position)
return eof_position
and compare it with the current position: get_eof_position == file_handle.tell().
Although I would personally use a with statement to handle opening and closing a file, in the case where you have to read from stdin and need to track an EOF exception, do something like this:
Use a try-catch with EOFError as the exception:
try:
input_lines = ''
for line in sys.stdin.readlines():
input_lines += line
except EOFError as e:
print e
I use this function:
# Returns True if End-Of-File is reached
def EOF(f):
current_pos = f.tell()
file_size = os.fstat(f.fileno()).st_size
return current_pos >= file_size
This code will work for python 3 and above
file=open("filename.txt")
f=file.readlines() #reads all lines from the file
EOF=-1 #represents end of file
temp=0
for k in range(len(f)-1,-1,-1):
if temp==0:
if f[k]=="\n":
EOF=k
else:
temp+=1
print("Given file has",EOF,"lines")
file.close()
You can try this code:
import sys
sys.stdin = open('input.txt', 'r') # set std input to 'input.txt'
count_lines = 0
while True:
try:
v = input() # if EOF, it will raise an error
count_lines += 1
except EOFError:
print('EOF', count_lines) # print numbers of lines in file
break
You can use below code snippet to read line by line, till end of file:
line = obj.readline()
while(line != ''):
# Do Something
line = obj.readline()

Reading a file until a specific character in python

I am currently working on an application which requires reading all the input from a file until a certain character is encountered.
By using the code:
file=open("Questions.txt",'r')
c=file.readlines()
c=[x.strip() for x in c]
Every time strip encounters \n, it is removed from the input and treated as a string in list c.
This means every line is split into the part of a list c. But I want to make a list up to a point whenever a special character is encountered like this:
if the input file has the contents:
1.Hai
2.Bye\-1
3.Hello
4.OAPd\-1
then I want to get a list as
c=['1.Hai\n2.Bye','3.Hello\n4.OApd']
Please help me in doing this.
The easiest way would be to read the file in as a single string and then split it across your separator:
with open('myFileName') as myFile:
text = myFile.read()
result = text.split(separator) # use your \-1 (whatever that means) here
In case your file is very large, holding the complete contents in memory as a single string for using .split() is maybe not desirable (and then holding the complete contents in the list after the split is probably also not desirable). Then you could read it in chunks:
def each_chunk(stream, separator):
buffer = ''
while True: # until EOF
chunk = stream.read(CHUNK_SIZE) # I propose 4096 or so
if not chunk: # EOF?
yield buffer
break
buffer += chunk
while True: # until no separator is found
try:
part, buffer = buffer.split(separator, 1)
except ValueError:
break
else:
yield part
with open('myFileName') as myFile:
for chunk in each_chunk(myFile, separator='\\-1\n'):
print(chunk) # not holding in memory, but printing chunk by chunk
I used "*" instead of "-1", I'll let you make the appropriate changes.
s = '1.Hai\n2.Bye*3.Hello\n4.OAPd*'
temp = ''
results = []
for char in s:
if char is '*':
results.append(temp)
temp = []
else:
temp += char
if len(temp) > 0:
results.append(temp)

Convert Little-endian 24 bits file to an ASCII array

I have a .raw file containing a 52 lines html header followed by the data themselves. The file is encoded in little-endian 24bits SIGNED and I want to convert the data to integers in an ASCII file. I use Python 3.
I tried to 'unpack' the entire file with the following code found in this post:
import sys
import chunk
import struct
f1 = open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb')
data = struct.unpack('<i', chunk + ('\0' if chunk[2] < 128 else '\xff'))
But I get this error message:
TypeError: 'module' object is not subscriptable
EDIT
It seems this is better:
data = struct.unpack('<i','\0'+ bytes)[0] >> 8
But I still get an error message:
TypeError: must be str, not type
Easy to fix I presume?
That's not a nice file to process in Python! Python is great for processing text files, because it reads them in big chunks in an internal buffer and then iterates on lines, but you cannot easily access binary data that comes after text read like that. Additionally, the struct module has no support for 24 bits values.
The only way I can imagine is to read the file one byte at a time, first skip 52 time an end of line, then read bytes 3 at a time, concatenate them in a 4 bytes byte string and unpack it.
Possible code could be:
eol = b'\n' # or whatever is the end of line in your file
nlines = 52 # number of lines to skip
with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
for i in range(nlines): # process nlines lines
t = b'' # to store the content of each line
while True:
x = f1.read(1) # one byte at a time
if x == eol: # ok we have one full line
break
else:
t += x # else concatenate into current line
print(t) # to control the initial 52 lines
while True:
t = bytes((0,)) # struct only knows how to process 4 bytes int
for i in range(3): # so build one starting with a null byte
t += f1.read(1)
# print(t)
if(len(t) == 1): break # reached end of file
if(len(t) < 4): # reached end of file with uncomplete value
print("Remaining bytes at end of file", t)
break
# the trick is that the integer division by 256 skips the initial 0 byte and keeps the sign
i = struct.unpack('<i', t)[0]//256 # // for Python 3, only / for Python 2
print(i, hex(i)) # or any other more useful processing
Remark: above code assumes that your description of 52 lines (terminated by an end of line) is true, but the shown image let think that last line is not. In that case, you should first count 51 lines and then skip the content of the last line.
def skipline(fd, nlines, eol):
for i in range(nlines): # process nlines lines
t = b'' # to store the content of each line
while True:
x = fd.read(1) # one byte at a time
if x == eol: # ok we have one full line
break
else:
t += x # else concatenate into current line
# print(t) # to control the initial 52 lines
with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
skiplines(f1, 51, b'\n') # skip 51 lines terminated with a \n
skiplines(f1, 1, b'>') # skip last line assuming it ends at the >
...

Python: Seeking to EOL in file not working

I have this method:
def get_chunksize(path):
"""
Breaks a file into chunks and yields the chunk sizes.
Number of chunks equals the number of available cores.
Ensures that each chunk ends at an EOL.
"""
size = os.path.getsize(path)
cores = mp.cpu_count()
chunksize = size/cores # gives truncated integer
f = open(path)
while 1:
start = f.tell()
f.seek(chunksize, 1) # Go to the next chunk
s = f.readline() # Ensure the chunk ends at the end of a line
yield start, f.tell()-start
if not s:
break
It is supposed to break a file into chunks and return the start of the chunk (in bytes) and the chunk size.
Crucially, the end of a chunk should correspond to the end of a line (which is why the f.readline() behaviour is there), but I am finding that my chunks are not seeking to an EOL at all.
The purpose of the method is to then read chunks which can be passed to a csv.reader instance (via StringIO) for further processing.
I've been unable to spot anything obviously wrong with the function...any ideas why it is not moving to the EOL?
I came up with this rather clunky alternative:
def line_chunker(path):
size = os.path.getsize(path)
cores = mp.cpu_count()
chunksize = size/cores # gives truncated integer
f = open(path)
while True:
part = f.readlines(chunksize)
yield csv.reader(StringIO("".join(part)))
if not part:
break
This will split the file into chunks with a csv reader for each chunk, but the last chunk is always empty (??) and having to join the list of strings back together is rather clunky.
if not s:
break
Instead of looking at s to see if you're at the end of the file, you should look if you've reached the end of the file by using:
if size == f.tell(): break
this should fix it. I wouldn't depend on a CSV file having a single record per line though. I've worked with several CSV files that have strings with new-lines:
first,last,message
sue,ee,hello
bob,builder,"hello,
this is some text
that I entered"
jim,bob,I'm not so creative...
Notice the 2nd record (bob) spans across 3 lines. csv.reader can handle this. If the idea is to do some cpu intensive work on a csv. I'd create an array of threads, each with a buffer of n records. have the csv.reader pass a record to each thread using round-robin, skipping a thread if its buffer is full.
Hope this helps - enjoy.

What is the most efficient way to get first and last line of a text file?

I have a text file which contains a time stamp on each line. My goal is to find the time range. All the times are in order so the first line will be the earliest time and the last line will be the latest time. I only need the very first and very last line. What would be the most efficient way to get these lines in python?
Note: These files are relatively large in length, about 1-2 million lines each and I have to do this for several hundred files.
To read both the first and final line of a file you could...
open the file, ...
... read the first line using built-in readline(), ...
... seek (move the cursor) to the end of the file, ...
... step backwards until you encounter EOL (line break) and ...
... read the last line from there.
def readlastline(f):
f.seek(-2, 2) # Jump to the second last byte.
while f.read(1) != b"\n": # Until EOL is found ...
f.seek(-2, 1) # ... jump back, over the read byte plus one more.
return f.read() # Read all data from this point on.
with open(file, "rb") as f:
first = f.readline()
last = readlastline(f)
Jump to the second last byte directly to prevent trailing newline characters to cause empty lines to be returned*.
The current offset is pushed ahead by one every time a byte is read so the stepping backwards is done two bytes at a time, past the recently read byte and the byte to read next.
The whence parameter passed to fseek(offset, whence=0) indicates that fseek should seek to a position offset bytes relative to...
0 or os.SEEK_SET = The beginning of the file.
1 or os.SEEK_CUR = The current position.
2 or os.SEEK_END = The end of the file.
* As would be expected as the default behavior of most applications, including print and echo, is to append one to every line written and has no effect on lines missing trailing newline character.
Efficiency
1-2 million lines each and I have to do this for several hundred files.
I timed this method and compared it against against the top answer.
10k iterations processing a file of 6k lines totalling 200kB: 1.62s vs 6.92s.
100 iterations processing a file of 6k lines totalling 1.3GB: 8.93s vs 86.95.
Millions of lines would increase the difference a lot more.
Exakt code used for timing:
with open(file, "rb") as f:
first = f.readline() # Read and store the first line.
for last in f: pass # Read all lines, keep final value.
Amendment
A more complex, and harder to read, variation to address comments and issues raised since.
Return empty string when parsing empty file, raised by comment.
Return all content when no delimiter is found, raised by comment.
Avoid relative offsets to support text mode, raised by comment.
UTF16/UTF32 hack, noted by comment.
Also adds support for multibyte delimiters, readlast(b'X<br>Y', b'<br>', fixed=False).
Please note that this variation is really slow for large files because of the non-relative offsets needed in text mode. Modify to your need, or do not use it at all as you're probably better off using f.readlines()[-1] with files opened in text mode.
#!/bin/python3
from os import SEEK_END
def readlast(f, sep, fixed=True):
r"""Read the last segment from a file-like object.
:param f: File to read last line from.
:type f: file-like object
:param sep: Segment separator (delimiter).
:type sep: bytes, str
:param fixed: Treat data in ``f`` as a chain of fixed size blocks.
:type fixed: bool
:returns: Last line of file.
:rtype: bytes, str
"""
bs = len(sep)
step = bs if fixed else 1
if not bs:
raise ValueError("Zero-length separator.")
try:
o = f.seek(0, SEEK_END)
o = f.seek(o-bs-step) # - Ignore trailing delimiter 'sep'.
while f.read(bs) != sep: # - Until reaching 'sep': Read sep-sized block
o = f.seek(o-step) # and then seek to the block to read next.
except (OSError,ValueError): # - Beginning of file reached.
f.seek(0)
return f.read()
def test_readlast():
from io import BytesIO, StringIO
# Text mode.
f = StringIO("first\nlast\n")
assert readlast(f, "\n") == "last\n"
# Bytes.
f = BytesIO(b'first|last')
assert readlast(f, b'|') == b'last'
# Bytes, UTF-8.
f = BytesIO("X\nY\n".encode("utf-8"))
assert readlast(f, b'\n').decode() == "Y\n"
# Bytes, UTF-16.
f = BytesIO("X\nY\n".encode("utf-16"))
assert readlast(f, b'\n\x00').decode('utf-16') == "Y\n"
# Bytes, UTF-32.
f = BytesIO("X\nY\n".encode("utf-32"))
assert readlast(f, b'\n\x00\x00\x00').decode('utf-32') == "Y\n"
# Multichar delimiter.
f = StringIO("X<br>Y")
assert readlast(f, "<br>", fixed=False) == "Y"
# Make sure you use the correct delimiters.
seps = { 'utf8': b'\n', 'utf16': b'\n\x00', 'utf32': b'\n\x00\x00\x00' }
assert "\n".encode('utf8' ) == seps['utf8']
assert "\n".encode('utf16')[2:] == seps['utf16']
assert "\n".encode('utf32')[4:] == seps['utf32']
# Edge cases.
edges = (
# Text , Match
("" , "" ), # Empty file, empty string.
("X" , "X" ), # No delimiter, full content.
("\n" , "\n"),
("\n\n", "\n"),
# UTF16/32 encoded U+270A (b"\n\x00\n'\n\x00"/utf16)
(b'\n\xe2\x9c\x8a\n'.decode(), b'\xe2\x9c\x8a\n'.decode()),
)
for txt, match in edges:
for enc,sep in seps.items():
assert readlast(BytesIO(txt.encode(enc)), sep).decode(enc) == match
if __name__ == "__main__":
import sys
for path in sys.argv[1:]:
with open(path) as f:
print(f.readline() , end="")
print(readlast(f,"\n"), end="")
docs for io module
with open(fname, 'rb') as fh:
first = next(fh).decode()
fh.seek(-1024, 2)
last = fh.readlines()[-1].decode()
The variable value here is 1024: it represents the average string length. I choose 1024 only for example. If you have an estimate of average line length you could just use that value times 2.
Since you have no idea whatsoever about the possible upper bound for the line length, the obvious solution would be to loop over the file:
for line in fh:
pass
last = line
You don't need to bother with the binary flag you could just use open(fname).
ETA: Since you have many files to work on, you could create a sample of couple of dozens of files using random.sample and run this code on them to determine length of last line. With an a priori large value of the position shift (let say 1 MB). This will help you to estimate the value for the full run.
Here's a modified version of SilentGhost's answer that will do what you want.
with open(fname, 'rb') as fh:
first = next(fh)
offs = -100
while True:
fh.seek(offs, 2)
lines = fh.readlines()
if len(lines)>1:
last = lines[-1]
break
offs *= 2
print first
print last
No need for an upper bound for line length here.
Can you use unix commands? I think using head -1 and tail -n 1 are probably the most efficient methods. Alternatively, you could use a simple fid.readline() to get the first line and fid.readlines()[-1], but that may take too much memory.
This is my solution, compatible also with Python3. It does also manage border cases, but it misses utf-16 support:
def tail(filepath):
"""
#author Marco Sulla (marcosullaroma#gmail.com)
#date May 31, 2016
"""
try:
filepath.is_file
fp = str(filepath)
except AttributeError:
fp = filepath
with open(fp, "rb") as f:
size = os.stat(fp).st_size
start_pos = 0 if size - 1 < 0 else size - 1
if start_pos != 0:
f.seek(start_pos)
char = f.read(1)
if char == b"\n":
start_pos -= 1
f.seek(start_pos)
if start_pos == 0:
f.seek(start_pos)
else:
char = ""
for pos in range(start_pos, -1, -1):
f.seek(pos)
char = f.read(1)
if char == b"\n":
break
return f.readline()
It's ispired by Trasp's answer and AnotherParker's comment.
First open the file in read mode.Then use readlines() method to read line by line.All the lines stored in a list.Now you can use list slices to get first and last lines of the file.
a=open('file.txt','rb')
lines = a.readlines()
if lines:
first_line = lines[:1]
last_line = lines[-1]
w=open(file.txt, 'r')
print ('first line is : ',w.readline())
for line in w:
x= line
print ('last line is : ',x)
w.close()
The for loop runs through the lines and x gets the last line on the final iteration.
with open("myfile.txt") as f:
lines = f.readlines()
first_row = lines[0]
print first_row
last_row = lines[-1]
print last_row
Here is an extension of #Trasp's answer that has additional logic for handling the corner case of a file that has only one line. It may be useful to handle this case if you repeatedly want to read the last line of a file that is continuously being updated. Without this, if you try to grab the last line of a file that has just been created and has only one line, IOError: [Errno 22] Invalid argument will be raised.
def tail(filepath):
with open(filepath, "rb") as f:
first = f.readline() # Read the first line.
f.seek(-2, 2) # Jump to the second last byte.
while f.read(1) != b"\n": # Until EOL is found...
try:
f.seek(-2, 1) # ...jump back the read byte plus one more.
except IOError:
f.seek(-1, 1)
if f.tell() == 0:
break
last = f.readline() # Read last line.
return last
Nobody mentioned using reversed:
f=open(file,"r")
r=reversed(f.readlines())
last_line_of_file = r.next()
Getting the first line is trivially easy. For the last line, presuming you know an approximate upper bound on the line length, os.lseek some amount from SEEK_END find the second to last line ending and then readline() the last line.
with open(filename, "rb") as f:#Needs to be in binary mode for the seek from the end to work
first = f.readline()
if f.read(1) == '':
return first
f.seek(-2, 2) # Jump to the second last byte.
while f.read(1) != b"\n": # Until EOL is found...
f.seek(-2, 1) # ...jump back the read byte plus one more.
last = f.readline() # Read last line.
return last
The above answer is a modified version of the above answers which handles the case that there is only one line in the file

Categories

Resources