Is there a way to precurse a write function in python (I'm working with fasta files but any write function that works with text files should work)?
The only way I could think is to read the whole file in as an array and count the number of lines I want to start at and just re-write that array, at that value, to a text file.
I was just thinking there might be a write an option or something somewhere.
I would add some code, but I'm writing it right now, and everyone on here seems to be pretty well versed, and probably know what I'm talking about. I'm an EE in the CS domain and just calling on the StackOverflow community to enlighten me.
From what I understand you want to truncate a file from the start - i.e remove the first n lines.
Then no - there is no way you can do without reading in the lines and ignoring the lines - this is what I would do :
import shutil
remove_to = 5 # Remove lines 0 to 5
try:
with open('precurse_me.txt') as inp, open('temp.txt') as out:
for index, line in enumerate(inp):
if index <= remove_to:
continue
out.write(line)
# If you don't want to replace the original file - delete this
shutil.move('temp.txt', 'precurse_me.txt')
except Exception as e:
raise e
Here I open a file for the output and then use shutil.move() to replace the input file only after the processing (the for loop) is complete. I do this so that I don't break the 'precurse_me.txt' file in case the processing fails. I wrap the whole thing in a try/except so that if anything fails it doesn't try to move the file by accident.
The key is the for loop - read the input file line by line; using the enumerate() function to count the lines as they come in.
Ignore those lines (by using continue) until the index says to not ignore the line - after that simply write each line to the out file.
Related
I know the normal way to write to a file by rewriting the lines except the ones to delete. But I want to know is there an efficient way to delete or update a line in place or append at the last in a file using file pointers in Python.
Appending to the end is easy:
with open('somefile', 'a') as f:
f.write(line) # Or with print to add a newline for you, print(line, file=f)
In the middle, you're generally stuck; unless the new line is exactly the same length as the existing line, you'll have to move all the data after that line around to make it work, and that risks data corruption if anything (including non-software issues like a power outage) goes wrong. In that case, just write a new file, and use os.replace to atomically replace the old file with the new file after the new file is written out completely.
Alternatively, I tried this, but it doesnt seem to get rid of the rows that have blank spaces (blank rows included in the number of rows I'd like to delete). Meanwhile, the code above appears to get rid of those blank spaces, but there is line termination.
next(filecsv) for i in range(10)
Use fileinput.input() with the inplace update file option:
from __future__ import print_function
import fileinput
skip_rows = int(input('How many rows to skip? '))
f = fileinput.input('input.csv', inplace=True)
for i in range(skip_rows):
f.readline()
for row in f:
print(row, end='')
This will skip the first skip_rows rows of the input file and overwrite it without you having to manage writing and moving a temporary file.
(You can omit importing print_function if you are using Python 3)
There are quite a few ways to grab input from a command line tool (which is what I am inferring you wrote). Here are a couple:
Option 1: created in a file called out.py
use sys.argv
import sys
arg1 = sys.argv[1]
print("passed in value: %s" % arg1)
Then run it by passing in an argument (note index 1, script is index 0)
python out.py cell1
passed in value: cell1
Option 2:
A potentially better way is to use a commandline tool framework like click: http://click.pocoo.org/5/. This has almost everything you could ever want to do, and they handle much of the hard logic for you.
You can prompt the user with a simple while loop and listen into standard input or using the input() function.
As to your question on how to delete lines in a file, you can read in the file as a list of lines.
lines=[]
with open('input.txt') as f:
lines=f.readlines()
You can then write back into the file everything past the lines you want to skip by using list slicing.
Also I am pretty sure similar questions have been asked before, try to Google or search Stack Overflow for your question or a subset of your question next time.
P.S.
I also want to add that if you are reading a very large file, it would be better if you read a line at a time, and outputted to a separate file. For a large enough file, you might run out of RAM to hold the file in memory.
Firstly I have opened up the csv file [...]
Did you consider to use pandas to process your data?
If so, pandas.read_csv, allows to skip lines using the skiprows parameter.
You will typically use an iterator to read files. You could do something like this:
numToSkip = 3
with open('somefile.txt') as f:
for i, line in enumerate(f):
if i < numToSkip : continue
# Do 'whatnot' processing here
Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.
The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).
Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.
The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.
Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not
Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.
I want to know how to edit a file on the fly row by row in python.
For example I have a text file where I usually have:
key value
key value
key value
key value
key value
...
they are not necessarily the same pair for each line. It's just the way I explained it.
I would like to show line by line key and value (on my terminal) and then I want to do one of this two things:
-just press enter (or whatever hot-key) to go ahead and read (show) next line.
-enter a new value then hit enter. this will actually replace the value (that was being shown) on the file and finally go ahead to show next pair of key values.
Till end of file or possibly till I type 'quit' or some other keyword. doesn't matter.
-Being able to go back to the previous row would be a plus (in case of accidentally going to next row), but it's not too important for now.
I find myself often editing huge files in a very tedious and repetitive way, and text editors are really frustrating with their cursors going everywhere when pressing the arrow-key. Also having to use the backspace to delete is annoying.
I know how to read a file and how to write a file in python. But not in such interactive way. I only know how to write the whole file at once. Plus I wouldn't know if it is safe to open the same file in both reading and writing. Also I know how to manipulate each line, split the text in a list of values etc... all I really need is to understand how to modify the file at that exact current line and handle well this type of interaction.
what is the best way to do this?
All the answers focus on loading the contents of the file in memory, modifying and then on close saving all on disk, so I thought I'd give it a try:
import os
sep = " "
with open("inline-t.txt", "rb+") as fd:
seekpos = fd.tell()
line = fd.readline()
while line:
print line
next = raw_input(">>> ")
if next == ":q":
break
if next:
values = line.split(sep)
newval = values[0] + sep + next + '\n'
if len(newval) == len(line):
fd.seek(seekpos)
fd.write(newval)
fd.flush()
os.fsync(fd)
else:
remaining = fd.read()
fd.seek(seekpos)
fd.write(newval + remaining)
fd.flush()
os.fsync(fd)
fd.seek(seekpos)
line = fd.readline()
seekpos = fd.tell()
line = fd.readline()
The script simply opens the file, reads line by line, and rewrites it if the user inputs a new value. If the length of the data matches previous data, seek and write are enough. If the new data is of different size, we need to clean-up after us. So the remainder of the file is read, appended to the new data, and everything is rewritten to disk. fd.flush and os.fsync(fd) guarantee that changes are indeed available in the file as soon as it is written out. Not the best solution, performance-wise, but I believe this is closer to what he asked.
Also, consider there might be a few quirks in this code, and I'm sure there's room for optimizing -- perhaps one global read at the beggining to avoid multiple whole file reads if changes that need adjusting are made often, or something like that.
The way I would go about this is to load all the lines of the text file in a list, and then iterate through that list, changing the values of the list as you go along. Then at the very end (when you get to the last line or whenever you want), you will write that whole list out to the file with the same name, so that way it will overwrite the old file.
am simply iterating through an external file (which contains a phrase) and want to see if a line exists (which has the word 'Dad' in it) If i find it, I want to replace it with 'Mum'. Here is the program i've built... but am not sure why it isn't working?!
message_file = open('test.txt','w')
message_file.write('Where\n')
message_file.write('is\n')
message_file.write('Dad\n')
message_file.close()
message_temp_file = open('testTEMP.txt','w')
message_file = open('test.txt','r')
for line in message_file:
if line == 'Dad': # look for the word
message_temp_file.write('Mum') # replace it with mum in temp file
else:
message_temp_file.write(line) # else, just write the word
message_file.close()
message_temp_file.close()
import os
os.remove('test.txt')
os.rename('testTEMP.txt','test.txt')
This should be so simple...it's annoyed me! Thanks.
You don't have any lines that are "Dad". You have a line that is "Dad\n", but no "Dad". In addition, since you've done message_file.read(), the cursor is at the end of your file so for line in message_file will return StopIteration immediately. You should do message_file.seek(0) just before your for loop.
print(message_file.read())
message_file.seek(0)
for line in message_file:
if line.strip() == "Dad":
...
That should put the cursor back at the beginning of the file, and strip out the newline and get you what you need.
Note that this exercise is a great example of how not to do things in general! The better implementation would have been:
in_ = message_file.read()
out = in_.replace("Dad","Mum")
message_temp_file.write(out)
print(message_file.read())
here you already read the whole file.
Nothing is left for the for loop to check
A file object always remembers where it stopped to read/write the last time you accessed it.
So if you call print(message_file.readline()), the first line of the file is read and printed. Next time you call the same command, the second line is read and printed and so on until you reach the end of the file. By using print(message_file.read()) you have read the whole file and any further call of read or readline will give you nothing
You can get the current position by message_file.tell() and set it to a certain value by message_file.seek(value), or simply reopen the file
The problem most likely is due to the fact that your conditional will only match the string "Dad", when the string is actually "Dad\n". You could either update your conditional to:
if line == "Dad\n":
OR
if "Dad" in line:
Lastly, you also read the entire file when you call print(message_file.read()). You either need to remove that line, or you need to put a call to message_file.seek(0) in order for the loop that follows to actually do anything.