This question already has answers here:
Read a file starting from the second line in python
(3 answers)
Using python, how to read a file starting at the seventh line ?
(11 answers)
Closed 12 months ago.
How does this enumerate works? I want a specific starting index but yet the loop goes too far(index out of range)
def endingIndexOfTable(file, index):
r = re.compile('^V.*(.).*(.).*(.).*(-).*(-).*(.).*(.).*(:).*$')
for i, line in enumerate(file, start= index):
if list(filter(r.match, line)) or "Sales Tax" in line:
return i
I want my program to start searching from line index and to return the line where I find the string I am looking for.
I don't think you can start at a specific line of a file. I think you have to skip all the preceding lines first:
def endingIndexOfTable(file, index):
r = re.compile('^V.*(.).*(.).*(.).*(-).*(-).*(.).*(.).*(:).*$')
for i, line in enumerate(file):
if i >= index:
if list(filter(r.match, line)) or "Sales Tax" in line:
return i
Although, did you mean return line?
Then, the version with islice should be like this:
from itertools import islice
def endingIndexOfTable(file, index):
r = re.compile('^V.*(.).*(.).*(.).*(-).*(-).*(.).*(.).*(:).*$')
for i, line in islice(enumerate(file), index, None):
if list(filter(r.match, line)) or "Sales Tax" in line:
return i
(again assuming that both the regex and the return are correct)
EDIT
I screwed up in the same way as OP. This does not answer the question.
I don't want to deal with your regex, but here's one way to achieve the logic you need for searching from a specific line. It would load the entire file in memory though, and not actually read just the specific line.
poem.txt is just the file I used to test. Contents:
Author of the poem is: Me
poem is called: Test
AAFgz
S2zergtrxbhcn
Dzrgxt
Frhgc
Gzxcnhvjzx
xghrfcan a
jvzxhdyrfcv
kh
def read_by_line(file, index):
for i, line in enumerate(file.readlines(), start=index):
print(line)
if "a" in line: # if condition could have been your regex stuff
return i
with open('poem.txt', 'r') as file_object:
print(read_by_line(file_object, 5))
Related
What im trying to do is match a phrase in a text file, then print that line(This works fine). I then need to move the cursor up 4 lines so I can do another match in that line, but I cant get the seek() method to move up 4 lines from the line that has been matched so that I can do another regex search. All I can seem to do with seek() is search from the very end of the file, or the beginning. It doesn't seem to let me just do seek(105,1) from the line that is matched.
### This is the example test.txt
This is 1st line
This is 2nd line # Needs to seek() to this line from the 6th line. This needs to be dynamic as it wont always be 4 lines.
This is 3rd line
This is 4th line
This is 5th line
This is 6st line # Matches this line, now need to move it up 4 lines to the "2nd line"
This is 7 line
This is 8 line
This is 9 line
This is 10 line
#
def Findmatch():
file = open("test.txt", "r")
print file.tell() # shows 0 which is the beginning of the file
string = file.readlines()
for line in string:
if "This is 6th line" in line:
print line
print file.tell() # shows 171 which is the end of the file. I need for it to be on the line that matches my search which should be around 108. seek() only lets me search from end or beginning of file, but not from the line that was matched.
Findmatch()
Since you've read all of it into memory at once with file.readlines(). tell() method does indeed correctly point to the end and your already have all your lines in an array. If you still wanted to, you'd have to read the file in line by line and record position within file for each line start so that you could go back four lines.
For your described problem. You can first find index of the line first match and then do the second operation starting from the list slice four items before that.
Here a very rough example of that (return None isn't really needed, it's just for sake of verbosity, clearly stating intent/expected behavior; raising an exception might be just as well a desired depending on what the overall plan is):
def relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
return lines[idx:]
else:
return None
with open("test.txt") as in_file:
lines = in_file.readlines()
print(''.join(relevant("This is 6th line", lines)))
Please also note: It's a bit confusing to name list of lines string (one would probably expect a str there), go with lines or something else) and it's also not advisable (esp. since you indicate to be using 2.7) to assign your variable names already used for built-ins, like file. Use in_file for instance.
EDIT: As requested in a comment, just a printing example, adding it in parallel as the former seem potentially more useful for further extension. :) ...
def print_relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
print(line.rstrip('\n'))
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
print(lines[idx].rstrip('\n'))
with open("test.txt") as in_file:
lines = in_file.readlines()
print_relevant("This is 6th line", lines)
Note, since lines are read in with trailing newlines and print would add one of its own I've rstrip'ed the line before printing. Just be aware of it.
This question already has answers here:
Accessing the index in 'for' loops
(26 answers)
How can I use `return` to get back multiple values from a loop? Can I put them in a list?
(2 answers)
How should I read a file line-by-line in Python?
(3 answers)
Closed 6 years ago.
First timer here with really using files and I/O. I'm running my code through a tester and the tester calls the different files I'm working with through my code. So for this, I'm representing the file as "filename" below and the string I'm looking for in that file as "s". I'm pretty sure I'm going through the lines of the code and searching for the string correctly. This is what I have for that :
def locate(filename, s):
file= open(filename)
line= file.readlines()
for s in line:
if s in line:
return [line.count]
I'm aware the return line isn't correct. How would I return the number of the line that the string I'm looking for is located on as a list?
You can use enumerate to keep track of the line number:
def locate(filename, s):
with open(filename) as f:
return [i for i, line in enumerate(f, 1) if s in line]
In case the searched string can be found from first and third line it will produce following output:
[1, 3]
You can use enumerate.
Sample Text File
hello hey s hi
hola
s
Code
def locate(filename, letter_to_find):
locations = []
with open(filename, 'r') as f:
for line_num, line in enumerate(f):
for word in line.split(' '):
if letter_to_find in word:
locations.append(line_num)
return locations
Output
[0, 2]
As we can see it shows that the string s on lines 0 and 2.
Note computers start counting at 0
Whats going on
Opens the file with read permissions.
Iterates over each line, enumerateing them as it goes, and keeping track of the line number in line_num.
Iterates over each word in the line.
If the letter_to_find that you passed into the function is in word, it appends the line_num to locations.
return locations
These are the problem lines
for s in line:
if s in line:
you have to read line into another variable apart from s
def locate(filename, s):
file= open(filename)
line= file.readlines()
index = 0;
for l in line:
print l;
index = index + 1
if s in l:
return index
print locate("/Temp/s.txt","s")
I would like delete specific line and re-assign the line number:
eg:
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
what I want: if line 1 is the line need to be delete, then
output should be:
0,abc,def
1,mno,pqr
2,stu,vwx
What I have done so far:
f=open(file,'r')
lines = f.readlines()
f.close()
f.open(file,'w')
for line in lines:
if line.rsplit(',')[0] != 'line#':
f.write(line)
f.close()
above lines can delete specifc line#, but I don't konw how to rewrite the line number before the first ','
Here is a function that will do the job.
def removeLine(n, file):
f = open(file,"r+")
d = f.readlines()
f.seek(0)
for i in range(len(d)):
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
elif i != n:
f.write(d[i])
f.truncate()
f.close()
Where the parameters n and file are the line you wish to delete and the filepath respectively.
This is assuming the line numbers are written in the line as implied by your example input.
If the number of the line is not included at the beginning of each line, as some other answers have assumed, simply remove the first if statement:
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
I noticed that your account wasn't created in the past few hours, so I figure that there's no harm in giving you the benefit of the doubt. You will really have more fun on StackOverflow if you spend the time to learn its culture.
I wrote a solution that fits your question's criteria on a file that's already written (you mentioned that you're opening a text file), so I assume it's a CSV.
I figured that I'd answer your question differently than the other solutions that implement the CSV reader library and use a temporary file.
import re
numline_csv = re.compile("\d\,")
# substitute your actual file opening here
so_31195910 = """
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
"""
so = so_31195910.splitlines()
# this could be an input or whatever you need
delete_line = 1
line_bank = []
for l in so:
if l and not l.startswith(str(delete_line)+','):
print(l)
l = re.split(numline_csv, l)
line_bank.append(l[1])
so = []
for i,l in enumerate(line_bank):
so.append("%s,%s" % (i,l))
And the output:
>>> so
['0,abc,def', '1,mno,pqr', '2,stu,vwx']
In order to get a line number for each line, you should use the enumerate method...
for line_index, line in enumerate(lines):
# line_index is 0 for the first line, 1 for the 2nd line, &ct
In order to separate the first element of the string from the rest of the string, I suggest passing a value for maxsplit to the split method.
>>> '0,abc,def'.split(',')
['0', 'abc', 'def']
>>> '0,abc,def'.split(',',1)
['0', 'abc,def']
>>>
Once you have those two, it's just a matter of concatenating line_index to split(',',1)[1].
This question already has answers here:
How to read a large file - line by line?
(11 answers)
Closed 7 months ago.
I'm writing an assignment to count the number of vowels in a file, currently in my class we have only been using code like this to check for the end of a file:
vowel=0
f=open("filename.txt","r",encoding="utf-8" )
line=f.readline().strip()
while line!="":
for j in range (len(line)):
if line[j].isvowel():
vowel+=1
line=f.readline().strip()
But this time for our assignment the input file given by our professor is an entire essay, so there are several blank lines throughout the text to separate paragraphs and whatnot, meaning my current code would only count until the first blank line.
Is there any way to check if my file has reached its end other than checking for if the line is blank? Preferably in a similar fashion that I have my code in currently, where it checks for something every single iteration of the while loop
Thanks in advance
Don't loop through a file this way. Instead use a for loop.
for line in f:
vowel += sum(ch.isvowel() for ch in line)
In fact your whole program is just:
VOWELS = {'A','E','I','O','U','a','e','i','o','u'}
# I'm assuming this is what isvowel checks, unless you're doing something
# fancy to check if 'y' is a vowel
with open('filename.txt') as f:
vowel = sum(ch in VOWELS for line in f for ch in line.strip())
That said, if you really want to keep using a while loop for some misguided reason:
while True:
line = f.readline().strip()
if line == '':
# either end of file or just a blank line.....
# we'll assume EOF, because we don't have a choice with the while loop!
break
Find end position of file:
f = open("file.txt","r")
f.seek(0,2) #Jumps to the end
f.tell() #Give you the end location (characters from start)
f.seek(0) #Jump to the beginning of the file again
Then you can to:
if line == '' and f.tell() == endLocation:
break
import io
f = io.open('testfile.txt', 'r')
line = f.readline()
while line != '':
print line
line = f.readline()
f.close()
I discovered while following the above suggestions that
for line in f:
does not work for a pandas dataframe (not that anyone said it would)
because the end of file in a dataframe is the last column, not the last row.
for example if you have a data frame with 3 fields (columns) and 9 records (rows), the for loop will stop after the 3rd iteration, not after the 9th iteration.
Teresa
So, basically, I need a program that opens a .dat file, checks each line to see if it meets certain prerequisites, and if they do, copy them into a new csv file.
The prerequisites are that it must 1) contain "$W" or "$S" and 2) have the last value at the end of the line of the DAT say one of a long list of acceptable terms. (I can simply make-up a list of terms and hardcode them into a list)
For example, if the CSV was a list of purchase information and the last item was what was purchased, I only want to include fruit. In this case, the last item is an ID Tag, and I only want to accept a handful of ID Tags, but there is a list of about 5 acceptable tags. The Tags have very veriable length, however, but they are always the last item in the list (and always the 4th item on the list)
Let me give a better example, again with the fruit.
My original .DAT might be:
DGH$G$H $2.53 London_Port Gyro
DGH.$WFFT$Q5632 $33.54 55n39 Barkdust
UYKJ$S.52UE $23.57 22#3 Apple
WSIAJSM_33$4.FJ4 $223.4 Ha25%ek Banana
Only the line: "UYKJ$S $23.57 22#3 Apple" would be copied because only it has both 1) $W or $S (in this case a $S) and 2) The last item is a fruit. Once the .csv file is made, I am going to need to go back through it and replace all the spaces with commas, but that's not nearly as problematic for me as figuring out how to scan each line for requirements and only copy the ones that are wanted.
I am making a few programs all very similar to this one, that open .dat files, check each line to see if they meet requirements, and then decides to copy them to the new file or not. But sadly, I have no idea what I am doing. They are all similar enough that once I figure out how to make one, the rest will be easy, though.
EDIT: The .DAT files are a few thousand lines long, if that matters at all.
EDIT2: The some of my current code snippets
Right now, my current version is this:
def main():
#NewFile_Loc = C:\Users\J18509\Documents
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
OldText = OldFile.read()
# for i in range(0, len(OldText)):
# if (OldText[i] != " "):
# print OldText[i]
i = split_line(OldText)
if u'$S' in i:
# $S is in the line
print i
main()
But it's very choppy still. I'm just learning python.
Brief update: the server I am working on is down, and might be for the next few hours, but I have my new code, which has syntax errors in it, but here it is anyways. I'll update again once I get it working. Thanks a bunch everyone!
import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
if os.path.isfile(NewFilePath):
os.remove(NewFilePath)
NewFile = open (NewFilePath, 'w')
NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
OldFile_Loc=raw_input("Input File for Program:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile:
LineParts = line.split()
if (LineParts[0].find($W)) or (LineParts[0].find($S)):
if LineParts[3] in Acceptable_Values:
print(LineParts[1], ' is accepted')
#This Line is acceptable!
NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
OldFile.close()
NewFile.close()
main()
There are two parts you need to implement: First, read a file line by line and write lines meeting a specific criteria. This is done by
with open('file.dat') as f:
for line in f:
stripped = line.strip() # remove '\n' from the end of the line
if test_line(stripped):
print stripped # Write to stdout
The criteria you want to check for are implemented in the function test_line. To check for the occurrence of "$W" or "$S", you can simply use the in-Operator like
if not '$W' in line and not '$S' in line:
return False
else:
return True
To check, if the last item in the line is contained in a fixed list, first split the line using split(), then take the last item using the index notation [-1] (negative indices count from the end of a sequence) and then use the in operator again against your fixed list. This looks like
items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
return True
else:
return False
Now, you combine these two parts into the test_line function like
def test_line(line):
if not '$W' in line and not '$S' in line:
return False
items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
return True
else:
return False
Note that the program writes the result to stdout, which you can easily redirect. If you want to write the output to a file, have a look at Correct way to write line to file in Python
inlineRequirements = ['$W','$S']
endlineRequirements = ['Apple','Banana']
inputFile = open(input_filename,'rb')
outputFile = open(output_filename,'wb')
for line in inputFile.readlines():
line = line.strip()
#trailing and leading whitespace has been removed
if any(req in line for req in inlineRequirements):
#passed inline requirement
lastWord = line.split(' ')[-1]
if lastWord in endlineRequirements:
#passed endline requirement
outputFile.write(line.replace(' ',','))
#replaced spaces with commas and wrote to file
inputFile.close()
outputFile.close()
tags = ['apple', 'banana']
match = ['$W', '$S']
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile.readlines(): # Loop through the file
line = line.strip() # Remove the newline and whitespace
if line and not line.isspace(): # If the line isn't empty
lparts = line.split() # Split the line
if any(tag.lower() == lparts[-1].lower() for tag in tags) and any(c in line for c in match):
# $S or $W is in the line AND the last section is in tags(case insensitive)
print line
import re
list_of_fruits = ["Apple","Bannana",...]
with open('some.dat') as f:
for line in f:
if re.findall("\$[SW]",line) and line.split()[-1] in list_of_fruits:
print "Found:%s" % line
import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
if os.path.isfile(NewFilePath):
os.remove(NewFilePath)
NewFile = open (NewFilePath, 'w')
NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
OldFile_Loc=raw_input("Input File for Program:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile:
LineParts = line.split()
if (LineParts[0].find(\$W)) or (LineParts[0].find(\$S)):
if LineParts[3] in Acceptable_Values:
print(LineParts[1], ' is accepted')
#This Line is acceptable!
NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
OldFile.close()
NewFile.close()
main()
This worked great, and has all the capabilities I needed. The other answers are good, but none of them do 100% of what I needed like this one does.