So I have a file that contains this:
SequenceName 4.6e-38 810..924
SequenceName_FGS_810..924 VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
SequenceName 1.6e-38 887..992
SequenceName_GYQ_887..992 PLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
I want my program to read only the lines that contain these protein sequences. Up until now I got this, which skips the first line and read the second one:
handle = open(filename, "r")
handle.readline()
linearr = handle.readline().split()
handle.close()
fnamealpha = fname + ".txt"
handle = open(fnamealpha, "w")
handle.write(">%s\n%s\n" % (linearr[0], linearr[1]))
handle.close()
But it only processes the first sequence and I need it to process every line that contains a sequence, so I need a loop, how can I do it?
The part that saves to a txt file is really important too so I need to find a way in which I can combine these two objectives.
My output with the above code is:
>SequenceName_810..924
VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
Okay, I think I understand your question--you want to iterate over the lines in the file, right? But only the second line in the sequence--the one with the protein sequence--matters, correct? Here's my suggestion:
# context manager `with` takes care of file closing, error handling
with open(filename, 'r') as handle:
for line in handle:
if line.startswith('SequenceName_'):
print line.split()
# Write to file, etc.
My reasoning being that you're only interested in lines that start with SequenceName_###.
Use readlines and throw it all into a for loop.
with open(filename, 'r') as fh:
for line in fh.readlines:
# do processing here
In the #do processing here section, you can just prepare another list of lines to write to the other file. (Using with handles all the proper closure and sure.)
Related
I got a text file like this
Bruce
brucechungulloa#outlook.com
I've used this to read the text file and export it to a list
with open('info.txt') as f:
info = f.readlines()
for item in info:
reportePaises = open('reportePaises.txt', 'w')
reportePaises.write("%s\n" % item)
But when I want to write the elements of the list(info) into another text file, only the info[1] is written (the mail)
How can I write the entire list onto the text file?
with open('data.csv') as f:
with open('test2.txt', 'a') as wp:
for item in f.readlines():
wp.write("%s" % item)
wp.write('\n') # adds a new line after the looping is done
That will give you:
Bruce
brucechungulloa#outlook.com
In both files.
You were having problems because every time you open a file with 'w' flag, you overwrite it on the disk. So, you created a new file every time.
You should open the second file only once, in the with statement:
with open('info.txt') as f, open('reportePaises.txt', 'w') as reportePaises:
info = f.readlines()
for item in info:
reportePaises.write(item)
As #Pynchia suggested, it's probably better not to use .readlines(), and loop directly on input file instead.
with open('info.txt') as f, open('reportePaises.txt', 'w') as reportePaises:
for item in f:
reportePaises.write(item)
This way you don't create a copy of the while file in your RAM by saving it to a list, which may cause a huge delay if the file is big (and, obviously, uses more RAM). Instead, you treat the input file as an iterator and just read next line directly from your HDD on each iteration.
You also (if I did the testing right) don't need to append '\n' to every line. The newlines are already in item. Because of that you don't need to use string formatting at all, just reportePaises.write(item).
You are opening your file in write mode every time you write to a file, effectively overwriting the previous line that you wrote. Use the append mode, a, instead.
reportePaises = open('reportePaises.txt', 'a')
Edit: Alternatively, you can open the file once and instead of looping through the lines, write the whole contents as follows:
with open('reportePaises.txt', 'w') as file:
file.write(f.read())
Try this without open output file again and again.
with open('info.txt') as f:
info = f.readlines()
with open('reportePaises.txt', 'w') as f1:
for x in info:
f1.write("%s\n" % x)
That will work.
Two problems here. One is you are opening the output file inside the loop. That means it is being opened several times. Since you also use the "w" flag that means the file is truncated to zero each time it is opened. Therefore you only get the last line written.
It would be better to open the output file once outside the loop. You could even use an outer with block.
You can simply try the below code. Your code did not work because you added the opening on file handler 'reportPaises' within the for loop. You don't need to open the file handler again and again.
Try re running your code line by line in the python shell as it is very easy to debug the bugs in the code.
The below code will work
with open('something.txt') as f:
info = f.readlines()
reportePaises = open('reportePaises.txt', 'w')
for item in info:
reportePaises.write("%s" % item)
You don't need to add a \n to the output line because when you perform readlines, the \n character is preserved in the info list file. Please look observe below.
Try below
with open('something.txt') as f:
info = f.readlines()
print info
The output you will get is
['Bruce\n', 'brucechungulloa#outlook.com']
When editing the contents of a file I have been using the approach of:
Open the file in read mode
Convert file contents to a string with the .read() method and assign to another variable
Close the file
Do things to the string
Open the original file in write mode
Write the string to file
Close the file
For example:
fo = open('file.html', r)
fo_as_string = fo.read()
fo.close()
# # #
# do stuff to fo_as_string here
# # #
fo = open('file.html', w)
fo.write(fo_as_string)
fo.close()
I now find myself in the situation however where I need to remove any white space at the beginning of lines and I think as I have converted the file object to a string there is no way to target this whitespace, at a 'line' level, with string methods like lstrip and rstrip.
So I guess I am after logic advice on how to retain the flexibility of having the file contents as a string for manipulation, but also be able to target lines within the string for specific line manipulation when required, as in the example above.
Use a for-loop, a for-loop over a file object returns one line at a time.
#use `with` statement for handling files, it automatically closes the file for you.
with open('file.html') as fo, open('file1.html', 'w') as fo1:
for line in fo: #reads one line at a time, memory efficient
#do something with line, line.strip()
fo1.write(line + '\n') #write line to to fo1
If you're trying to modify the same file then use fileinput module:
import fileinput
for line in fileinput.input('file.html', inplace = True):
#do something with line
print line #writes the line back to 'file.html'
You can also get individual lines from file.read() as well, split it using:
fo_as_string = fo.read()
lines = fo_as_string.splitlines()
But file.read() loads the whole file into memory, so it is not much memory efficient.
Other alternatives are f.readlines() and list(f), both return a list of all lines from the file object.
Depending on the size of the file, and the processes you want to do to each line, there are a couple of answers that might work for you.
First, if you're intent on keeping the entire file in memory while you process it, you could save it as a list of lines, process some or all of the lines, and rejoin them with your standard line delimiter when you wish to write them to disk:
linesep = '\n'
with open('file.html', 'r') as fin:
input_lines = fin.readlines()
# Do your per-line transformation
modified_lines = [line.lstrip() for line in input_lines]
# Join the lines into one string to do whole-string processing
whole_string = linesep.join(modified_lines)
# whatever full-string processing you're looking for, do here
# Write to disk
with open('file1.html', 'w') as output_file:
output_file.write(whole_string)
Or you could specify your own line separator, and do the input parsing by hand:
linesep = '\n'
input_lines_by_hand = fin.read.split(linesep)
I have a text document that I would like to repeatedly remove the first line of text from every 30 seconds or so.
I have already written (or more accurately copied) the code for the python resettable timer object that allows a function to be called every 30 seconds in a non blocking way if not asked to reset or cancel.
Resettable timer in python repeats until cancelled
(If someone could check the way I implemented the repeat in that is ok, because my python sometimes crashes while running that, would be appreciated :))
I now want to write my function to load a text file and perhaps copy all but the first line and then rewrite it to the same text file. I can do this, this way I think... but is it the most efficient ?
def removeLine():
with open(path, 'rU') as file:
lines = deque(file)
try:
print lines.popleft()
except IndexError:
print "Nothing to pop?"
with open(path, 'w') as file:
file.writelines(lines)
This works, but is it the best way to do it ?
I'd use the fileinput module with inplace=True:
import fileinput
def removeLine():
inputfile = fileinput.input(path, inplace=True, mode='rU')
next(inputfile, None) # skip a line *if present*
for line in inputfile:
print line, # write out again, but without an extra newline
inputfile.close()
inplace=True causes sys.stdout to be redirected to the open file, so we can simply 'print' the lines.
The next() call is used to skip the first line; giving it a default None suppresses the StopIteration exception for an empty file.
This makes rewriting a large file more efficient as you only need to keep the fileinput readlines buffer in memory.
I don't think a deque is needed at all, even for your solution; just use next() there too, then use list() to catch the remaining lines:
def removeLine():
with open(path, 'rU') as file:
next(file, None) # skip a line *if present*
lines = list(file)
with open(path, 'w') as file:
file.writelines(lines)
but this requires you to read all of the file in memory; don't do that with large files.
I have a 22mb text file containing a list of numbers (1 number per line). I am trying to have python read the number, process the number and write the result in another file. All of this works but if I have to stop the program it starts all over from the beginning. I tried to use a mysql database at first but it was way too slow. I am getting about 4 times the number being processed this way. I would like to be able to delete the line after the number was processed.
with open('list.txt', 'r') as file:
for line in file:
filename = line.rstrip('\n') + ".txt"
if os.path.isfile(filename):
print "File", filename, "exists, skipping!"
else:
#process number and write file
#(need code to delete current line here)
As you can see every time it is restarted it has to search the hard drive for the file name to make sure it gets to the place it left off. With 1.5 million numbers this can take a while. I found an example with truncate but it did not work.
Are there any commands similar to array_shift (PHP) for python that will work with text files.
I would use a marker file to keep the number of the last line processed instead of rewriting the input file:
start_from = 0
try:
with open('last_line.txt', 'r') as llf: start_from = int(llf.read())
except:
pass
with open('list.txt', 'r') as file:
for i, line in enumerate(file):
if i < start_from: continue
filename = line.rstrip('\n') + ".txt"
if os.path.isfile(filename):
print "File", filename, "exists, skipping!"
else:
pass
with open('last_line.txt', 'w') as outfile: outfile.write(str(i))
This code first checks for the file last_line.txt and tries to read a number from it. The number is the number of line which was processed in during the previous attempt. Then it simply skips the required number of lines.
I use Redis for stuff like that. Install redis and then pyredis and you can have a persistent set in memory. Then you can do:
r = redis.StrictRedis('localhost')
with open('list.txt', 'r') as file:
for line in file:
if r.sismember('done', line):
continue
else:
#process number and write file
r.sadd('done', line)
if you don't want to install Redis you can also use the shelve module, making sure that you open it with the writeback=False option. I really recommend Redis though, it makes things like this so much easier.
Reading the data file should not be a bottleneck. The following code read a 36 MB, 697997 line text file in about 0,2 seconds on my machine:
import time
start = time.clock()
with open('procmail.log', 'r') as f:
lines = f.readlines()
end = time.clock()
print 'Readlines time:', end-start
Because it produced the following result:
Readlines time: 0.1953125
Note that this code produces a list of lines in one go.
To know where you've been, just write the number of lines you've processed to a file. Then if you want to try again, read all the lines and skip the ones you've already done:
import os
# Raad the data file
with open('list.txt', 'r') as f:
lines = f.readlines()
skip = 0
try:
# Did we try earlier? if so, skip what has already been processed
with open('lineno.txt', 'r') as lf:
skip = int(lf.read()) # this should only be one number.
del lines[:skip] # Remove already processed lines from the list.
except:
pass
with open('lineno.txt', 'w+') as lf:
for n, line in enumerate(lines):
# Do your processing here.
lf.seek(0) # go to beginning of lf
lf.write(str(n+skip)+'\n') # write the line number
lf.flush()
os.fsync() # flush and fsync make sure the lf file is written.
I am new to Python programming...
I have a .txt file....... It looks like..
0,Salary,14000
0,Bonus,5000
0,gift,6000
I want to to replace the first '0' value to '1' in each line. How can I do this? Any one can help me.... With sample code..
Thanks in advance.
Nimmyliji
I know that you're asking about Python, but forgive me for suggesting that perhaps a different tool is better for the job. :) It's a one-liner via sed:
sed 's/^0,/1,/' yourtextfile.txt > output.txt
This applies the regex /^0,/ (which matches any 0, that occurs at the beginning of a line) to each line and replaces the matched text with 1, instead. The output is directed into the file output.txt specified.
inFile = open("old.txt", "r")
outFile = open("new.txt", "w")
for line in inFile:
outFile.write(",".join(["1"] + (line.split(","))[1:]))
inFile.close()
outFile.close()
If you would like something more general, take a look to Python csv module. It contains utilities for processing comma-separated values (abbreviated as csv) in files. But it can work with arbitrary delimiter, not only comma. So as you sample is obviously a csv file, you can use it as follows:
import csv
reader = csv.reader(open("old.txt"))
writer = csv.writer(open("new.txt", "w"))
writer.writerows(["1"] + line[1:] for line in reader)
To overwrite original file with new one:
import os
os.remove("old.txt")
os.rename("new.txt", "old.txt")
I think that writing to new file and then renaming it is more fault-tolerant and less likely corrupt your data than direct overwriting of source file. Imagine, that your program raised an exception while source file was already read to memory and reopened for writing. So you would lose original data and your new data wouldn't be saved because of program crash. In my case, I only lose new data while preserving original.
o=open("output.txt","w")
for line in open("file"):
s=line.split(",")
s[0]="1"
o.write(','.join(s))
o.close()
Or you can use fileinput with in place edit
import fileinput
for line in fileinput.FileInput("file",inplace=1):
s=line.split(",")
s[0]="1"
print ','.join(s)
f = open(filepath,'r')
data = f.readlines()
f.close()
edited = []
for line in data:
edited.append( '1'+line[1:] )
f = open(filepath,'w')
f.writelines(edited)
f.flush()
f.close()
Or in Python 2.5+:
with open(filepath,'r') as f:
data = f.readlines()
with open(outfilepath, 'w') as f:
for line in data:
f.write( '1' + line[1:] )
This should do it. I wouldn't recommend it for a truly big file though ;-)
What is going on (ex 1):
1: Open the file in read mode
2,3: Read all the lines into a list (each line is a separate index) and close the file.
4,5,6: Iterate over the list constructing a new list where each line has the first character replaced by a 1. The line[1:] slices the string from index 1 onward. We concatenate the 1 with the truncated list.
7,8,9: Reopen the file in write mode, write the list to the file (overwrite), flush the buffer, and close the file handle.
In Ex. 2:
I use the with statement that lets the file handle closing itself, but do essentially the same thing.