Is there a python library for line based file reading? [duplicate] - python

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Python: How to read huge text file into memory
To process a large text file(1G+) line by line , random access by any line number is desired, most importantly, without loading the whole file content into RAM. Is there a python library to do that?
It is beneficial when analyzing a large log file, read only is enough.
If there is no such standard library, I have to seek an alternative method: Find a set of function/class that can return the N-th line of sub-string from a big string-like object, so that I can mmap(yes, I mean memory-mapped file object) the file to that object then do line-based processing.
Thank you.
PS: A log file is almost sure to have variable line length.

I think that something like below might work, since the file object's method readline() reads one line at a time. If the lines are of arbitrary length, you need to index the positions like follows.
lines = [0]
with open("testmat.txt") as f:
while f.readline():
lines.append(f.tell())
# now you can read an arbitrary line:
f.seek(lines[1235])
line = f.readline()
If the lines were of same length, you could just do f.seek(linenumber*linelenght)

Related

How insert instead of overwrite at specific position of file [duplicate]

This question already has answers here:
Insert line at middle of file with Python?
(11 answers)
Closed 4 years ago.
I'm trying to insert some text at specific position of file using this:
with open("test.txt","r+") as f:
f.seek(5)
f.write("B")
But this overwrites character at position 5 with new data ("B") instead of inserting it.
for example if i have
AAAAAAAAAA
in file test.txt and run the code
I get AAAAABAAAA instead of AAAAABAAAAA (five A must be after B)
How can i insert at desired position of file instead of overwrite?
There are three answers for that:
Generic file API (one you expect on all OSes) have no interface for 'insert' (read: this is impossible)
You can implement this by yourself, by reading whole file into memory, composing new content and writing it back. (If file is big, you may need to create some code to do this in chunks).
Good news for linux users: Since linux 3.15 it's possible to insert holes in the middle of the file (basically, shifting everything in file starting from specific location of a specific offset). There is a comprehensive article on this topic here: https://lwn.net/Articles/629965/. It is supported for ext4 and XFS filesystems, and it requires some low-level operations on fd (e.f. not the usual open for the python). Moreover, as I checked (Sep 2018) a fallocate module on pypi does not support it, so you need to write a low-level code to do FALLOC_FL_INSERT_RANGE ioctl.
TL;DR; If you file is small, read it into memory and do insert in memory. If you file is medium size (1-2Gb) do it in temp file and rename it after that. If your file is large, use windowed operations or dig down to FALLOC_FL_INSERT_RANGE (if you have a relatively modern linux).
This worked for me :
with open("test.txt","r+") as f:
f.seek(5) #first fseek to the position
line=f.readline() #read everything after it
f.seek(5) #since file pointer has moved, fseek back to 5
f.write("B") #write the letter
f.write(line) #write the remaining part
Original : AAAAAAAAAA
After : AAAAABAAAAA
f1 = open("test.txt","r+")
f1.seek(5)
data = "{}{}".format("B",f1.readline())
f1.seek(5)
f1.write(data)
f1.close()

How does readline() work behind the scenes when reading a text file?

I would like to understand how readline() takes in a single line from a text file. The specific details I would like to know about, with respect to how the compiler interprets the Python language and how this is handled by the CPU, are:
How does the readline() know which line of text to read, given that successive calls to readline() read the text line by line?
Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?
I am a "beginner" (I have about 4 years of "simpler" programming experience), so I wouldn't be able to understand technical details, but feel free to expand if it could help others understand!
Example using the file file.txt:
fake file
with some text
in a few lines
Question 1: How does the readline() know which line of text to read, given that successive calls to readline() read the text line by line?
When you open a file in python, it creates a file object. File objects act as file descriptors, which means at any one point in time, they point to a specific place in the file. When you first open the file, that pointer is at the beginning of the file. When you call readline(), it moves the pointer forward to the character just after the next newline it reads.
Calling the tell() function of a file object returns the location the file descriptor is currently pointing to.
with open('file.txt', 'r') as fd:
print fd.tell()
fd.readline()
print fd.tell()
# output:
0
10
# Or 11, depending on the line separators in the file
Question 2: Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?
First off, reading a file doesn't really have anything to do with the CPU. It has to do with the operating system and the file system. Both of those determine how files can be read and written to. Barebones explanation of files
For random access in files, you can use the mmap module of python. The Python Module of the Week site has a great tutorial.
Example, jumping to the 2nd line in the example file and reading until the end:
import mmap
import contextlib
with open('file.txt', 'r') as fd:
with contextlib.closing(mmap.mmap(fd.fileno(), 0, access=mmap.ACCESS_READ)) as mm:
print mm[10:]
# output:
with some text
in a few lines
This is a very broad question and it's unlikely that all details about what the CPU does would fit in an answer. But a high-level answer is possible:
readline reads each line in order. It starts by reading chunks of the file from the beginning. When it encounters a line break, it returns that line. Each successive invocation of readline returns the next line until the last line has been read. Then it returns an empty string.
with open("myfile.txt") as f:
while True:
line = f.readline()
if not line:
break
# do something with the line
Readline uses operating system calls under the hood. The file object corresponds to a file descriptor in the OS, and it has a pointer that keeps track of where in the file we are at the moment. The next read will return the next chunk of data from the file from that point on.
You would have to scan through the file first in order to know how many lines there are, and then use some way of starting from the "middle" line. If you meant some arbitrary line except the first and last lines, you would have to scan the file from the beginning identifying lines (for example, you could repeatedly call readline, throwing away the result), until you have reached the line you want). There is a ready-made module for this: linecache.
import linecache
linecache.getline("myfile.txt", 5) # we already know we want line 5

Differences between file.read(), file.readline() and iterating over the file object [duplicate]

This question already has answers here:
Python readlines() usage and efficient practice for reading
(2 answers)
Closed 7 years ago.
I am new to computer science and am trying to create a function in python that will open files on my computer.
I know that the function f.readline() grabs the current line as a string, but what makes the functions f.read() and for line in f: different? Thanks.
read(x) will read up to x bytes in a file. If you don't supply the size, the entire file is read.
readline(x) will read up to x bytes or a newline, whichever comes first. If you don't supply a size, it will read all data until it hits a newline.
When using for line in f, it will call the next() method under the hood which really just does something very similar to readline (although I see references that is may do some buffering more efficiently since iterating usually means you are planning to read the entire file).
There is also readlines() which reads all lines into memory.

Deleting the first line of a text file in python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Editing specific line in text file in python
I am writing a software that allows users to write data into a text file. However, I am not sure how to delete the first line of the text file and rewrite the line. I want the user to be able to update the text file's first line by clicking on a button and inputing in something but that requires deleting and writing a new line as the first line which I am not sure how to implement. Any help would be appreciated.
Edit:
So I sought out the first line of the file and tried to write another line but that doesn't delete the previous line.
file.seek(0)
file.write("This is the new first line \n")
You did not describe how you opened the file to begin with. If you used file = open(somename, "a") that file will not be truncated but new data is written at the end (even after a seek on most if not all modern systems). You would have to open the file with "r+")
But your example assumes that the line you write is exactly the same length as what the user typed. There is no line organisation in the files, just bytes, some of which indicate line ending.
Wat you need to do is use a temporary file or a temporary buffer in memory for all the lines and then write the lines out with the first replaced.
If things fit in memory (which I assume since few users are going to type so much it does not fit), you should be able to do:
lines = open(somename, 'r').readlines()
lines[0] = "This is the new first line \n"
file = open(somename, 'w')
for line in lines:
file.write(line)
file.close()
You could use readlines to get an array of lines and then use del on the first index of the array. This might help. http://www.daniweb.com/software-development/python/threads/68765/how-to-remove-a-number-of-lines-from-a-text-file-

Modifying a single line in a file [duplicate]

This question already has answers here:
Editing specific line in text file in Python
(11 answers)
Closed 3 months ago.
Is there a way, in Python, to modify a single line in a file without a for loop looping through all the lines?
The exact positions within the file that need to be modified are unknown.
This should work -
f = open(r'full_path_to_your_file', 'r') # pass an appropriate path of the required file
lines = f.readlines()
lines[n-1] = "your new text for this line" # n is the line number you want to edit; subtract 1 as indexing of list starts from 0
f.close() # close the file and reopen in write mode to enable writing to file; you can also open in append mode and use "seek", but you will have some unwanted old data if the new data is shorter in length.
f = open(r'full_path_to_your_file', 'w')
f.writelines(lines)
# do the remaining operations on the file
f.close()
However, this can be resource consuming (both time and memory) if your file size is too large, because the f.readlines() function loads the entire file, split into lines, in a list.
This will be just fine for small and medium sized files.
Unless we're talking about a fairly contrived situation in which you already know a lot about the file, the answer is no. You have to iterate over the file to determine where the newline characters are; there's nothing special about a "line" when it comes to file storage -- it all looks the same.
Yes, you can modify the line in place, but if the length changes, you will have to rewrite the remainder of the file.
You'll also need to know where the line is, in the file. This usually means the program needs to at least read through the file up to the line that needs to be changed.
There are exceptions - if the lines are all fixed length, or you have some sort of index on the file for example

Categories

Resources