Printing specific lines txt file python - python

I have a text file I wish to analyze. I'm trying to find every line that contains certain characters (ex: "#") and then print the line located 3 lines before it (ex: if line 5 contains "#", I would like to print line 2)
This is what I got so far:
file = open('new_file.txt', 'r')
a = list()
x = 0
for line in file:
x = x + 1
if '#' in line:
a.append(x)
continue
x = 0
for index, item in enumerate(a):
for line in file:
x = x + 1
d = a[index]
if x == d - 3:
print line
continue
It won't work (it prints nothing when I feed it a file that has lines containing "#"), any ideas?

First, you are going through the file multiple times without re-opening it for subsequent times. That means all subsequent attempts to iterate the file will terminate immediately without reading anything.
Second, your indexing logic a little convoluted. Assuming your files are not huge relative to your memory size, it is much easier to simply read the whole into memory (as a list) and manipulate it there.
myfile = open('new_file.txt', 'r')
a = myfile.readlines();
for index, item in enumerate(a):
if '#' in item and index - 3 >= 0:
print a[index - 3].strip()
This has been tested on the following input:
PrintMe
PrintMe As Well
Foo
#Foo
Bar#
hello world will print
null
null
##

Ok, the issue is that you have already iterated completely through the file descriptor file in line 4 when you try again in line 11. So line 11 will make an empty loop. Maybe it would be a better idea to iterate the file only once and remember the last few lines...
file = open('new_file.txt', 'r')
a = ["","",""]
for line in file:
if "#" in line:
print(a[0], end="")
a.append(line)
a = a[1:]

For file IO it is usually most efficient for programmer time and runtime to use reg-ex to match patterns. In combination with iteration through the lines in the file. your problem really isn't a problem.
import re
file = open('new_file.txt', 'r')
document = file.read()
lines = document.split("\n")
LinesOfInterest = []
for lineNumber,line in enumerate(lines):
WhereItsAt = re.search( r'#', line)
if(lineNumber>2 and WhereItsAt):
LinesOfInterest.append(lineNumber-3)
print LinesOfInterest
for lineNumber in LinesOfInterest:
print(lines[lineNumber])
Lines of Interest is now a list of line numbers matching your criteria
I used
line1,0
line2,0
line3,0
#
line1,1
line2,1
line3,1
#
line1,2
line2,2
line3,2
#
line1,3
line2,3
line3,3
#
as input yielding
[0, 4, 8, 12]
line1,0
line1,1
line1,2
line1,3

Related

Read text line by line in python

I would like to make a script that read a text line by line and based on lines if it finds a certain parameter populates an array. The idea is this
Read line
if Condition 1
#True
nested if Condition 2
...
else Condition 1 is not true
read next line
I can't get it to work though. I'm using readline () to read the text line by line, but the main problem is that the command never works to make it read the next line. Can you help me? Below an extract of my actual code:
col = 13 # colonne
rig = 300 # righe
a = [ [ None for x in range(col) ] for y in range(rig) ]
counter = 1
file = open('temp.txt', 'r')
files = file.readline()
for line in files:
if 'bandEUTRA: 32' in line:
if 'ca-BandwidthClassDL-EUTRA: a' in line:
a[counter][5] = 'DLa'
counter = counter + 1
else:
next(files)
else:
next(files)
print('\n'.join(map(str, a)))
Fixes for the code you asked about inline, and some other associated cleanup, with comments:
col = 13 # colonne
rig = 300 # righe
a = [[None] * col for y in range(rig)] # Innermost repeated list of immutable
# can use multiplication, just don't do it for
# outer list(s), see: https://stackoverflow.com/q/240178/364696
counter = 1
with open('temp.txt') as file: # Use with statement to get guaranteed file closure; 'r' is implicit mode and can be omitted
# Removed: files = file.readline() # This makes no sense; files would be a single line from the file, but your original code treats it as the lines of the file
# Replaced: for line in files: # Since files was a single str, this iterated characters of the file
for line in file: # File objects are iterators of their own lines, so you can get the lines one by one this way
if 'bandEUTRA: 32' in line and 'ca-BandwidthClassDL-EUTRA: a' in line: # Perform both tests in single if to minimize arrow pattern
a[counter][5] = 'DLa'
counter += 1 # May as well not say "counter" twice and use +=
# All next() code removed; next() advances an iterator and returns the next value,
# but files was not an iterator, so it was nonsensical, and the new code uses a for loop that advances it for you, so it was unnecessary.
# If the goal is to intentionally skip the next line under some conditions, you *could*
# use next(files, None) to advance the iterator so the for loop will skip it, but
# it's rare that a line *failing* a test means you don't want to look at the next line
# so you probably don't want it
# This works:
print('\n'.join(map(str, a)))
# But it's even simpler to spell it as:
print(*a, sep="\n")
# which lets print do the work of stringifying and inserting the separator, avoiding
# the need to make a potentially huge string in memory; it *might* still do so (no documented
# guarantees), but if you want to avoid that possibility, you could do:
sys.stdout.writelines(map('{}\n'.format, a))
# which technically doesn't guarantee it, but definitely actually operates lazily, or
for x in a:
print(x)
# which is 100% guaranteed not to make any huge strings
You can do:
with open("filename.txt", "r") as f:
for line in f:
clean_line = line.rstrip('\r\n')
process_line(clean_line)
Edit:
for your application of populating an array, you could do something like this:
with open("filename.txt", "r") as f:
contains = ["text" in l for l in f]
This will give you a list of length number of lines in filename.txt, the contents of the array will be False for each line that doesn't contain text, and True for each line that does.
Edit 2: To reflect #ShadowRanger's comments, I've changed my code to not do iterate over each line in the file without reading the whole thing at once.

How to read multiple lines at once? (practice script) [duplicate]

This question already has answers here:
How to read a file in reverse order?
(22 answers)
Closed 2 years ago.
I am making a .fasta parser script. (I know some parsers already exist for .fasta files but I need practice dealing with large files and thought this was a good start).
The goal of the program: take a very large .fasta file with multiple sequences, and return the Reverse compliment of each sequence in a new file.
I have a script that reads it one line at a time but each line is only about 50 bytes in a regular .fasta file. So buffering only one line at a time is not necessary. Is there a way I can set the amount of lines to buffer at a time?
For people who don't know what fasta is: A .fasta file is a text file with DNA sequences each with a header line before the DNA/RNA/Protein sequence marked by a '>'. Example:
>Sequence_1
ATG
TATA
>Sequence_2
TATA
GACT
ATG
Overview of my code:
Read the first byte of each line to Map the location of the sequences in the file by finding the '>'s.
Use this map to read each sequence backwards line by line (this is what I want to change to be able to)
Reverse
compliment the base pairs in the string (I.E. G->C)
Write this line to a new file
Here is what should be all the relevant code so you can see the functions the lines I am trying to change are calling: (can probably skip)
def get_fasta_seqs(BIGFILE): #Returns returns an object containing all of seq locations
f = open(BIGFILE)
cnt = 0
seq_name = []
seq_start = []
seq_end = []
seqcount = 0
#print(line)
#for loop skips first line check for seq name
if f.readline(1) == '>':
seq_name.append(cnt)
seq_start.append(cnt+1)
seqcount =+ 1
for line in f:
cnt += 1
if f.readline(1) == '>':
seq_name.append(cnt)
seq_start.append(cnt+1)
seqcount += 1
if seqcount > 1:
seq_end.append(cnt-1)
seq_end.append(cnt-1) #add location of final line
seqs = fileseq(seq_name,seq_start,seq_end,seqcount) #This class only has a __init__ function for these lists
return seqs
def fasta_rev_compliment(fasta_read,fasta_write = "default",NTtype = "DNA"):
if fasta_write == 'default':
fasta_write = fasta_read[:-6] + "_RC.fasta"
seq_map = get_fasta_seqs(fasta_read)
print(seq_map.seq_name)
f = open(fasta_write,'a')
for i in range(seq_map.seqcount): #THIS IS WHAT I WANT TO CHANGE
line = getline(fasta_read,seq_map.seq_name[i]+1) #getline is reading it as 1 indexed?
f.write(line)
my_fasta_seqs = get_fasta_seqs(fasta_read)
for seqline in reversed(range(seq_map.seq_start[i],seq_map.seq_end[i]+1)):
seq = getline(fasta_read,seqline+1)
seq = seq.replace('\n','')
seq = reverse_compliment(seq,NTtype = NTtype) #this function just returns the reverse compliment for that line.
seq = seq + '\n'
f.write(seq)
f.close()
fasta_rev_compliment('BIGFILE.fasta')
The main bit of code I want to change is here:
for i in range(seq_map.seqcount): #THIS IS WHAT I WANT TO CHANGE
line = getline(fasta_read,seq_map.seq_name[i]+1) #getline is reading it as 1 indexed?
f.write(line)
my_fasta_seqs = get_fasta_seqs(fasta_read)
for seqline in reversed(range(seq_map.seq_start[i],seq_map.seq_end[i]+1)):
seq = getline(fasta_read,seqline+1)
I want something like this:
def fasta_rev_compliment(fasta_read,fasta_write = "default",NTtype = "DNA",lines_to_record_before_flushing = 5):
###MORE CODE###
#i want something like this
for i in range(seq_map.seqcount): #THIS IS WHAT I WANT TO CHANGE
#is their a way to load
line = getline(fasta_read,seq_map.seq_name[i]+1) #getline is reading it as 1 indexed?
f.write(line)
my_fasta_seqs = get_fasta_seqs(fasta_read)
for seqline in reversed(range(seq_map.seq_start[i],seq_map.seq_end[i]+1)):
seq = getline(fasta_read,seqline+1)
#Repeat n = 5 (or other specified number) times until flushing ram.
The problem I am running into is the fact that I need to read the file backwards. All of the methods I can find don't work well when you try to apply it to reading the file backwards. Is there something that can read a file in chunks but backwards?
Or: Anything else that can make this more effective for a low memory setup. Right now it uses hardly any memory, but takes 21secs for a 100kB file with about 12,000 lines, but processes the file instantly using the file.readlines() method.
Here is an example of obtaining the reverse complement of a fasta file. Perhaps you can use some ideas from this.
import re
file = """\
>Sequence_1
ATG
TATA
>Sequence_2
TATA
GACT
ATG""".splitlines()
s = ''
for line in file:
line = line.rstrip()
if line.startswith('>'):
if len(s):
# complement the sequence of fasta 'TAGC' to 'ATCG'
# T to A, A to T, G to C, C to G
s = s.translate(str.maketrans('TAGC', 'ATCG'))
# reverse the string, 's[::-1]'
# Also, print up to 50 fasta per line to the end of the sequence
s = re.sub(r'(.{1,50})', r'\1\n', s[::-1])
print(s, end='')
s = ''
print(line)
else:
s += line
# print last sequence
s = s.translate(str.maketrans('TAGC', 'ATCG'))
s = re.sub(r'(.{1,50})', r'\1\n', s[::-1])
print(s, end='')
Prints:
>Sequence_1
TATACAT
>Sequence_2
CATAGTCTATA

Locate a specific line in a file based on user input then delete a specific number of lines

I'm trying to delete specific lines in a text file the way I need to go about it is by prompting the user to input a string (a phrase that should exist in the file) the file is then searched and if the string is there the data on that line and the number line number are both stored.
After the phrase has been found it and the five following lines are printed out. Now I have to figure out how to delete those six lines without changing any other text in the file which is my issue lol.
Any Ideas as to how I can delete those six lines?
This was my latest attempt to delete the lines
file = open('C:\\test\\example.txt', 'a')
locate = "example string"
for i, line in enumerate(file):
if locate in line:
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i + 1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
break
Usually I would not think it's desirable to overwrite the source file - what if the user does something by mistake? If your project allows, I would write the changes out to a new file.
with open('source.txt', 'r') as ifile:
with open('output.txt', 'w') as ofile:
locate = "example string"
skip_next = 0
for line in ifile:
if locate in line:
skip_next = 6
print(line.rstrip('\n'))
elif skip_next > 0:
print(line.rstrip('\n'))
skip_next -= 1
else:
ofile.write(line)
This is also robust to finding the phrase multiple times - it will just start counting lines to remove again.
You can find the occurrences, copy the list items between the occurrences to a new list and then save the new list into the file.
_newData = []
_linesToSkip = 3
with open('data.txt', 'r') as _file:
data = _file.read().splitlines()
occurrences = [i for i, x in enumerate(data) if "example string" in x]
_lastOcurrence = 0
for ocurrence in occurrences:
_newData.extend(data[_lastOcurrence : ocurrence])
_lastOcurrence = ocurrence + _linesToSkip
_newData.extend(data[_lastOcurrence:])
# Save new data into the file
There are a couple of points that you clearly misunderstand here:
.strip() removes whitespace or given characters:
>>> print(str.strip.__doc__)
S.strip([chars]) -> str
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
incrementing i doesn't actually do anything:
>>> for i, _ in enumerate('ignore me'):
... print(i)
... i += 10
...
0
1
2
3
4
5
6
7
8
You're assigning to the ith element of the line, which should raise an exception (that you neglected to tell us about)
>>> line = 'some text'
>>> line[i] = line.strip()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
Ultimately...
You have to write to a file if you want to change its contents. Writing to a file that you're reading from is tricky business. Writing to an alternative file, or just storing the file in memory if it's small enough is a much healthier approach.
search_string = 'example'
lines = []
with open('/tmp/fnord.txt', 'r+') as f: #`r+` so we can read *and* write to the file
for line in f:
line = line.strip()
if search_string in line:
print(line)
for _ in range(5):
print(next(f).strip())
else:
lines.append(line)
f.seek(0) # back to the beginning!
f.truncate() # goodbye, original lines
for line in lines:
print(line, file=f) # python2 requires `from __future__ import print_function`
There is a fatal flaw in this approach, though - if the sought after line is any closer than the 6th line from the end, it's going to have problems. I'll leave that as an exercise for the reader.
You are appending to your file by using open with 'a'. Also, you are not closing your file (bad habit). str.strip() does not delete the line, it removes whitespace by default. Also, this would usually be done in a loop.
This to get started:
locate = "example string"
n=0
with open('example.txt', 'r+') as f:
for i,line in enumerate(f):
if locate in line:
n = 6
if n:
print( line, end='' )
n-=1
print( "done" )
Edit:
Read-modify-write solution:
locate = "example string"
filename='example.txt'
removelines=5
with open(filename) as f:
lines = f.readlines()
with open(filename, 'w') as f:
n=0
for line in lines:
if locate in line:
n = removelines+1
if n:
n-=1
else:
f.write(line)

python delete specific line and re-assign the line number

I would like delete specific line and re-assign the line number:
eg:
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
what I want: if line 1 is the line need to be delete, then
output should be:
0,abc,def
1,mno,pqr
2,stu,vwx
What I have done so far:
f=open(file,'r')
lines = f.readlines()
f.close()
f.open(file,'w')
for line in lines:
if line.rsplit(',')[0] != 'line#':
f.write(line)
f.close()
above lines can delete specifc line#, but I don't konw how to rewrite the line number before the first ','
Here is a function that will do the job.
def removeLine(n, file):
f = open(file,"r+")
d = f.readlines()
f.seek(0)
for i in range(len(d)):
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
elif i != n:
f.write(d[i])
f.truncate()
f.close()
Where the parameters n and file are the line you wish to delete and the filepath respectively.
This is assuming the line numbers are written in the line as implied by your example input.
If the number of the line is not included at the beginning of each line, as some other answers have assumed, simply remove the first if statement:
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
I noticed that your account wasn't created in the past few hours, so I figure that there's no harm in giving you the benefit of the doubt. You will really have more fun on StackOverflow if you spend the time to learn its culture.
I wrote a solution that fits your question's criteria on a file that's already written (you mentioned that you're opening a text file), so I assume it's a CSV.
I figured that I'd answer your question differently than the other solutions that implement the CSV reader library and use a temporary file.
import re
numline_csv = re.compile("\d\,")
# substitute your actual file opening here
so_31195910 = """
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
"""
so = so_31195910.splitlines()
# this could be an input or whatever you need
delete_line = 1
line_bank = []
for l in so:
if l and not l.startswith(str(delete_line)+','):
print(l)
l = re.split(numline_csv, l)
line_bank.append(l[1])
so = []
for i,l in enumerate(line_bank):
so.append("%s,%s" % (i,l))
And the output:
>>> so
['0,abc,def', '1,mno,pqr', '2,stu,vwx']
In order to get a line number for each line, you should use the enumerate method...
for line_index, line in enumerate(lines):
# line_index is 0 for the first line, 1 for the 2nd line, &ct
In order to separate the first element of the string from the rest of the string, I suggest passing a value for maxsplit to the split method.
>>> '0,abc,def'.split(',')
['0', 'abc', 'def']
>>> '0,abc,def'.split(',',1)
['0', 'abc,def']
>>>
Once you have those two, it's just a matter of concatenating line_index to split(',',1)[1].

Python: Copying lines that meet requirements

So, basically, I need a program that opens a .dat file, checks each line to see if it meets certain prerequisites, and if they do, copy them into a new csv file.
The prerequisites are that it must 1) contain "$W" or "$S" and 2) have the last value at the end of the line of the DAT say one of a long list of acceptable terms. (I can simply make-up a list of terms and hardcode them into a list)
For example, if the CSV was a list of purchase information and the last item was what was purchased, I only want to include fruit. In this case, the last item is an ID Tag, and I only want to accept a handful of ID Tags, but there is a list of about 5 acceptable tags. The Tags have very veriable length, however, but they are always the last item in the list (and always the 4th item on the list)
Let me give a better example, again with the fruit.
My original .DAT might be:
DGH$G$H $2.53 London_Port Gyro
DGH.$WFFT$Q5632 $33.54 55n39 Barkdust
UYKJ$S.52UE $23.57 22#3 Apple
WSIAJSM_33$4.FJ4 $223.4 Ha25%ek Banana
Only the line: "UYKJ$S $23.57 22#3 Apple" would be copied because only it has both 1) $W or $S (in this case a $S) and 2) The last item is a fruit. Once the .csv file is made, I am going to need to go back through it and replace all the spaces with commas, but that's not nearly as problematic for me as figuring out how to scan each line for requirements and only copy the ones that are wanted.
I am making a few programs all very similar to this one, that open .dat files, check each line to see if they meet requirements, and then decides to copy them to the new file or not. But sadly, I have no idea what I am doing. They are all similar enough that once I figure out how to make one, the rest will be easy, though.
EDIT: The .DAT files are a few thousand lines long, if that matters at all.
EDIT2: The some of my current code snippets
Right now, my current version is this:
def main():
#NewFile_Loc = C:\Users\J18509\Documents
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
OldText = OldFile.read()
# for i in range(0, len(OldText)):
# if (OldText[i] != " "):
# print OldText[i]
i = split_line(OldText)
if u'$S' in i:
# $S is in the line
print i
main()
But it's very choppy still. I'm just learning python.
Brief update: the server I am working on is down, and might be for the next few hours, but I have my new code, which has syntax errors in it, but here it is anyways. I'll update again once I get it working. Thanks a bunch everyone!
import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
if os.path.isfile(NewFilePath):
os.remove(NewFilePath)
NewFile = open (NewFilePath, 'w')
NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
OldFile_Loc=raw_input("Input File for Program:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile:
LineParts = line.split()
if (LineParts[0].find($W)) or (LineParts[0].find($S)):
if LineParts[3] in Acceptable_Values:
print(LineParts[1], ' is accepted')
#This Line is acceptable!
NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
OldFile.close()
NewFile.close()
main()
There are two parts you need to implement: First, read a file line by line and write lines meeting a specific criteria. This is done by
with open('file.dat') as f:
for line in f:
stripped = line.strip() # remove '\n' from the end of the line
if test_line(stripped):
print stripped # Write to stdout
The criteria you want to check for are implemented in the function test_line. To check for the occurrence of "$W" or "$S", you can simply use the in-Operator like
if not '$W' in line and not '$S' in line:
return False
else:
return True
To check, if the last item in the line is contained in a fixed list, first split the line using split(), then take the last item using the index notation [-1] (negative indices count from the end of a sequence) and then use the in operator again against your fixed list. This looks like
items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
return True
else:
return False
Now, you combine these two parts into the test_line function like
def test_line(line):
if not '$W' in line and not '$S' in line:
return False
items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
return True
else:
return False
Note that the program writes the result to stdout, which you can easily redirect. If you want to write the output to a file, have a look at Correct way to write line to file in Python
inlineRequirements = ['$W','$S']
endlineRequirements = ['Apple','Banana']
inputFile = open(input_filename,'rb')
outputFile = open(output_filename,'wb')
for line in inputFile.readlines():
line = line.strip()
#trailing and leading whitespace has been removed
if any(req in line for req in inlineRequirements):
#passed inline requirement
lastWord = line.split(' ')[-1]
if lastWord in endlineRequirements:
#passed endline requirement
outputFile.write(line.replace(' ',','))
#replaced spaces with commas and wrote to file
inputFile.close()
outputFile.close()
tags = ['apple', 'banana']
match = ['$W', '$S']
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile.readlines(): # Loop through the file
line = line.strip() # Remove the newline and whitespace
if line and not line.isspace(): # If the line isn't empty
lparts = line.split() # Split the line
if any(tag.lower() == lparts[-1].lower() for tag in tags) and any(c in line for c in match):
# $S or $W is in the line AND the last section is in tags(case insensitive)
print line
import re
list_of_fruits = ["Apple","Bannana",...]
with open('some.dat') as f:
for line in f:
if re.findall("\$[SW]",line) and line.split()[-1] in list_of_fruits:
print "Found:%s" % line
import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
if os.path.isfile(NewFilePath):
os.remove(NewFilePath)
NewFile = open (NewFilePath, 'w')
NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
OldFile_Loc=raw_input("Input File for Program:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile:
LineParts = line.split()
if (LineParts[0].find(\$W)) or (LineParts[0].find(\$S)):
if LineParts[3] in Acceptable_Values:
print(LineParts[1], ' is accepted')
#This Line is acceptable!
NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
OldFile.close()
NewFile.close()
main()
This worked great, and has all the capabilities I needed. The other answers are good, but none of them do 100% of what I needed like this one does.

Categories

Resources