I want to know how to edit a file on the fly row by row in python.
For example I have a text file where I usually have:
key value
key value
key value
key value
key value
...
they are not necessarily the same pair for each line. It's just the way I explained it.
I would like to show line by line key and value (on my terminal) and then I want to do one of this two things:
-just press enter (or whatever hot-key) to go ahead and read (show) next line.
-enter a new value then hit enter. this will actually replace the value (that was being shown) on the file and finally go ahead to show next pair of key values.
Till end of file or possibly till I type 'quit' or some other keyword. doesn't matter.
-Being able to go back to the previous row would be a plus (in case of accidentally going to next row), but it's not too important for now.
I find myself often editing huge files in a very tedious and repetitive way, and text editors are really frustrating with their cursors going everywhere when pressing the arrow-key. Also having to use the backspace to delete is annoying.
I know how to read a file and how to write a file in python. But not in such interactive way. I only know how to write the whole file at once. Plus I wouldn't know if it is safe to open the same file in both reading and writing. Also I know how to manipulate each line, split the text in a list of values etc... all I really need is to understand how to modify the file at that exact current line and handle well this type of interaction.
what is the best way to do this?
All the answers focus on loading the contents of the file in memory, modifying and then on close saving all on disk, so I thought I'd give it a try:
import os
sep = " "
with open("inline-t.txt", "rb+") as fd:
seekpos = fd.tell()
line = fd.readline()
while line:
print line
next = raw_input(">>> ")
if next == ":q":
break
if next:
values = line.split(sep)
newval = values[0] + sep + next + '\n'
if len(newval) == len(line):
fd.seek(seekpos)
fd.write(newval)
fd.flush()
os.fsync(fd)
else:
remaining = fd.read()
fd.seek(seekpos)
fd.write(newval + remaining)
fd.flush()
os.fsync(fd)
fd.seek(seekpos)
line = fd.readline()
seekpos = fd.tell()
line = fd.readline()
The script simply opens the file, reads line by line, and rewrites it if the user inputs a new value. If the length of the data matches previous data, seek and write are enough. If the new data is of different size, we need to clean-up after us. So the remainder of the file is read, appended to the new data, and everything is rewritten to disk. fd.flush and os.fsync(fd) guarantee that changes are indeed available in the file as soon as it is written out. Not the best solution, performance-wise, but I believe this is closer to what he asked.
Also, consider there might be a few quirks in this code, and I'm sure there's room for optimizing -- perhaps one global read at the beggining to avoid multiple whole file reads if changes that need adjusting are made often, or something like that.
The way I would go about this is to load all the lines of the text file in a list, and then iterate through that list, changing the values of the list as you go along. Then at the very end (when you get to the last line or whenever you want), you will write that whole list out to the file with the same name, so that way it will overwrite the old file.
Related
Is there a way to precurse a write function in python (I'm working with fasta files but any write function that works with text files should work)?
The only way I could think is to read the whole file in as an array and count the number of lines I want to start at and just re-write that array, at that value, to a text file.
I was just thinking there might be a write an option or something somewhere.
I would add some code, but I'm writing it right now, and everyone on here seems to be pretty well versed, and probably know what I'm talking about. I'm an EE in the CS domain and just calling on the StackOverflow community to enlighten me.
From what I understand you want to truncate a file from the start - i.e remove the first n lines.
Then no - there is no way you can do without reading in the lines and ignoring the lines - this is what I would do :
import shutil
remove_to = 5 # Remove lines 0 to 5
try:
with open('precurse_me.txt') as inp, open('temp.txt') as out:
for index, line in enumerate(inp):
if index <= remove_to:
continue
out.write(line)
# If you don't want to replace the original file - delete this
shutil.move('temp.txt', 'precurse_me.txt')
except Exception as e:
raise e
Here I open a file for the output and then use shutil.move() to replace the input file only after the processing (the for loop) is complete. I do this so that I don't break the 'precurse_me.txt' file in case the processing fails. I wrap the whole thing in a try/except so that if anything fails it doesn't try to move the file by accident.
The key is the for loop - read the input file line by line; using the enumerate() function to count the lines as they come in.
Ignore those lines (by using continue) until the index says to not ignore the line - after that simply write each line to the out file.
This is an issue of trying to reach to the line to start from and proceed from there in the shortest time possible.
I have a huge text file that I'm reading and performing operations line after line. I am currently keeping track of the line number that i have parsed so that in case of any system crash I know how much I'm done with.
How do I restart reading a file from the point if I don't want to start over from the beginning again.
count = 0
all_parsed = os.listdir("urltextdir/")
with open(filename,"r") as readfile :
for eachurl in readfile:
if str(count)+".txt" not in all_parsed:
urltext = getURLText(eachurl)
with open("urltextdir/"+str(count)+".txt","w") as writefile:
writefile.write(urltext)
result = processUrlText(urltext)
saveinDB(result)
This is what I'm currently doing, but when it crashes at a million lines, I'm having to through all these lines in the file to reach the point I want to start from, my Other alternative is to use readlines and load the entire file in memory.
Is there an alternative that I can consider.
Unfortunately line number isn't really a basic position for file objects, and the special seeking/telling functions are ruined by next, which is called in your loop. You can't jump to a line, but you can to a byte position. So one way would be:
line = readfile.readline()
while line:
line = readfile.readline(): #Must use `readline`!
lastell = readfile.tell()
print(lastell) #This is the location of the imaginary cursor in the file after reading the line
print(line) #Do with line what you would normally do
print(line) #Last line skipped by loop
Now you can easily jump back with
readfile.seek(lastell) #You need to keep the last lastell)
You would need to keep saving lastell to a file or printing it so on restart you know which byte you're starting at.
Unfortunately you can't use the written file for this, as any modification to the character amount will ruin a count based on this.
Here is one full implementation. Create a file called tell and put 0 inside of it, and then you can run:
with open('tell','r+') as tfd:
with open('abcdefg') as fd:
fd.seek(int(tfd.readline())) #Get last position
line = fd.readline() #Init loop
while line:
print(line.strip(),fd.tell()) #Action on line
tfd.seek(0) #Clear and
tfd.write(str(fd.tell())) #write new position only if successful
line = fd.readline() #Advance loop
print(line) #Last line will be skipped by loop
You can check if such a file exists and create it in the program of course.
As #Edwin pointed out in the comments, you may want to fd.flush() and os.fsync(fd.fileno) (import os if that isn't clear) to make sure after every write you file contents are actually on disk - this would apply to both write operations you are doing, the tell the quicker of the two of course. This may slow things down considerably for you, so if you are satisfied with the synchronicity as is, do not use that, or only flush the tfd. You can also specify the buffer when calling open size so Python automatically flushes faster, as detailed in https://stackoverflow.com/a/3168436/6881240.
If I got it right,
You could make a simple log file to store the count in.
but still would would recommand to use many files or store every line or paragraph in a database le sql or mongoDB
I guess it depends on what system your script is running on, and what resources (such as memory) you have available.
But with the popular saying "memory is cheap", you can simply read the file into memory.
As a test, I created a file with 2 million lines, each line 1024 characters long with the following code:
ms = 'a' * 1024
with open('c:\\test\\2G.txt', 'w') as out:
for _ in range(0, 2000000):
out.write(ms+'\n')
This resulted in a 2 GB file on disk.
I then read the file into a list in memory, like so:
my_file_as_list = [a for a in open('c:\\test\\2G.txt', 'r').readlines()]
I checked the python process, and it used a little over 2 GB in memory (on a 32 GB system)
Access to the data was very fast, and can be done by list slicing methods.
You need to keep track of the index of the list, when your system crashes, you can start from that index again.
But more important... if your system is "crashing" then you need to find out why it is crashing... surely a couple of million lines of data is not a reason to crash anymore these days...
I have 2 files, passwd and dictionary. The passwd is a test file with one word, while the dictionary has a list of a few lines of words. My program so far reads and compares only the first line of the dictionary file. For example. My dictionary file contain (egg, fish, red, blue). My passwd file contains only (egg).
The program runs just fine, but once I switch the word egg in the dictionary file to lets say last in the list, the program wont read it and wont pull up results.
My code is below.
#!/usr/bin/passwd
import crypt
def testPass(line):
e = crypt.crypt(line,"HX")
print e
def main():
dictionary = open('dictionary', 'r')
password = open('passwd', 'r')
for line in dictionary:
for line2 in password:
if line == line2:
testPass(line2)
dictionary.close()
password.close()
main()
If you do
for line in file_obj:
....
you are implicitly using the readline method of the file, advancing the file pointer with each call. This means that after the inner loop is done for the first time, it will no longer be executed, because there are no more lines to read.
One possible solution is to keep one -- preferably the smaller -- file in memory using readlines. This way, you can iterate over it for each line you read from the other file.
file_as_list = file_obj.readlines()
for line in file_obj_2:
for line in file_as_list:
..
Once your inner loop runs once, it will have reached the end of the password file. When the outer loop hits its second iteration, there's nothing left to read in the password file because you haven't move the file pointer back to the start of the file.
There are many solutions to the problem. You can use seek to move the file pointer back to the start. Or, you can read the whole password file once and save the data in a list. Or, you can reopen the file on every iteration of the outer loop. The choice of which is best depends on the nature of the data (how many lines there are, are they on a slow network share or fast local disk?) and what your performance requirements are.
am simply iterating through an external file (which contains a phrase) and want to see if a line exists (which has the word 'Dad' in it) If i find it, I want to replace it with 'Mum'. Here is the program i've built... but am not sure why it isn't working?!
message_file = open('test.txt','w')
message_file.write('Where\n')
message_file.write('is\n')
message_file.write('Dad\n')
message_file.close()
message_temp_file = open('testTEMP.txt','w')
message_file = open('test.txt','r')
for line in message_file:
if line == 'Dad': # look for the word
message_temp_file.write('Mum') # replace it with mum in temp file
else:
message_temp_file.write(line) # else, just write the word
message_file.close()
message_temp_file.close()
import os
os.remove('test.txt')
os.rename('testTEMP.txt','test.txt')
This should be so simple...it's annoyed me! Thanks.
You don't have any lines that are "Dad". You have a line that is "Dad\n", but no "Dad". In addition, since you've done message_file.read(), the cursor is at the end of your file so for line in message_file will return StopIteration immediately. You should do message_file.seek(0) just before your for loop.
print(message_file.read())
message_file.seek(0)
for line in message_file:
if line.strip() == "Dad":
...
That should put the cursor back at the beginning of the file, and strip out the newline and get you what you need.
Note that this exercise is a great example of how not to do things in general! The better implementation would have been:
in_ = message_file.read()
out = in_.replace("Dad","Mum")
message_temp_file.write(out)
print(message_file.read())
here you already read the whole file.
Nothing is left for the for loop to check
A file object always remembers where it stopped to read/write the last time you accessed it.
So if you call print(message_file.readline()), the first line of the file is read and printed. Next time you call the same command, the second line is read and printed and so on until you reach the end of the file. By using print(message_file.read()) you have read the whole file and any further call of read or readline will give you nothing
You can get the current position by message_file.tell() and set it to a certain value by message_file.seek(value), or simply reopen the file
The problem most likely is due to the fact that your conditional will only match the string "Dad", when the string is actually "Dad\n". You could either update your conditional to:
if line == "Dad\n":
OR
if "Dad" in line:
Lastly, you also read the entire file when you call print(message_file.read()). You either need to remove that line, or you need to put a call to message_file.seek(0) in order for the loop that follows to actually do anything.
i have some data stored in a .txt file in this format:
----------|||||||||||||||||||||||||-----------|||||||||||
1029450386abcdefghijklmnopqrstuvwxy0293847719184756301943
1020414646canBeFollowedBySpaces 3292532113435532419963
don't ask...
i have many lines of this, and i need a way to add more digits to the end of a particular line.
i've written code to find the line i want, but im stumped as to how to add 11 characters to the end of it. i've looked around, this site has been helpful with some other issues i've run into, but i can't seem to find what i need for this.
it is important that the line retain its position in the file, and its contents in their current order.
using python3.1, how would you turn this:
1020414646canBeFollowedBySpaces 3292532113435532419963
into
1020414646canBeFollowedBySpaces 329253211343553241996301846372998
As a general principle, there's no shortcut to "inserting" new data in the middle of a text file. You will need to make a copy of the entire original file in a new file, modifying your desired line(s) of text on the way.
For example:
with open("input.txt") as infile:
with open("output.txt", "w") as outfile:
for s in infile:
s = s.rstrip() # remove trailing newline
if "target" in s:
s += "0123456789"
print(s, file=outfile)
os.rename("input.txt", "input.txt.original")
os.rename("output.txt", "input.txt")
Check out the fileinput module, it can do sort of "inplace" edits with files. though I believe temporary files are still involved in the internal process.
import fileinput
for line in fileinput.input('input.txt', inplace=1, backup='.orig'):
if line.startswith('1020414646canBeFollowedBySpaces'):
line = line.rstrip() + '01846372998' '\n'
print(line, end='')
The print now prints to the file instead of the console.
You might want to back up your original file before editing.
target_chain = '1020414646canBeFollowedBySpaces 3292532113435532419963'
to_add = '01846372998'
with open('zaza.txt','rb+') as f:
ch = f.read()
x = ch.find(target_chain)
f.seek(x + len(target_chain),0)
f.write(to_add)
f.write(ch[x + len(target_chain):])
In this method it's absolutely obligatory to open the file in binary mode 'b' for some reason linked to the treatment of the end of lines by Python (see Universal Newline, enabled by default)
The mode 'r+' is to allow the writing as well as the reading
In this method, what is before the target_chain in the file remains untouched. And what is after the target_chain is shifted ahead. As said by Greg Hewgill, there is no possibility to move apart bits on a hard drisk to insert new bits in the middle.
Evidently, if the file is very big, reading all of its content in ch could be too much memory consuming and the algorithm should then be changed: reading line after line until the line containing the target_chain, and then reading the next line before inserting, and then continuing to do "reading the next line - re-writing on the current line" until the end of the file in order to shift progressively the content from the line concerned with addition.
You see what I mean...
Copy the file, line by line, to another file. When you get to the line that needs extra chars then add them before writing.