Finding string in a file [duplicate] - python

This question already has answers here:
Accessing the index in 'for' loops
(26 answers)
How can I use `return` to get back multiple values from a loop? Can I put them in a list?
(2 answers)
How should I read a file line-by-line in Python?
(3 answers)
Closed 6 years ago.
First timer here with really using files and I/O. I'm running my code through a tester and the tester calls the different files I'm working with through my code. So for this, I'm representing the file as "filename" below and the string I'm looking for in that file as "s". I'm pretty sure I'm going through the lines of the code and searching for the string correctly. This is what I have for that :
def locate(filename, s):
file= open(filename)
line= file.readlines()
for s in line:
if s in line:
return [line.count]
I'm aware the return line isn't correct. How would I return the number of the line that the string I'm looking for is located on as a list?

You can use enumerate to keep track of the line number:
def locate(filename, s):
with open(filename) as f:
return [i for i, line in enumerate(f, 1) if s in line]
In case the searched string can be found from first and third line it will produce following output:
[1, 3]

You can use enumerate.
Sample Text File
hello hey s hi
hola
s
Code
def locate(filename, letter_to_find):
locations = []
with open(filename, 'r') as f:
for line_num, line in enumerate(f):
for word in line.split(' '):
if letter_to_find in word:
locations.append(line_num)
return locations
Output
[0, 2]
As we can see it shows that the string s on lines 0 and 2.
Note computers start counting at 0
Whats going on
Opens the file with read permissions.
Iterates over each line, enumerateing them as it goes, and keeping track of the line number in line_num.
Iterates over each word in the line.
If the letter_to_find that you passed into the function is in word, it appends the line_num to locations.
return locations

These are the problem lines
for s in line:
if s in line:
you have to read line into another variable apart from s
def locate(filename, s):
file= open(filename)
line= file.readlines()
index = 0;
for l in line:
print l;
index = index + 1
if s in l:
return index
print locate("/Temp/s.txt","s")

Related

python enumerate looping through a file [duplicate]

This question already has answers here:
Read a file starting from the second line in python
(3 answers)
Using python, how to read a file starting at the seventh line ?
(11 answers)
Closed 12 months ago.
How does this enumerate works? I want a specific starting index but yet the loop goes too far(index out of range)
def endingIndexOfTable(file, index):
r = re.compile('^V.*(.).*(.).*(.).*(-).*(-).*(.).*(.).*(:).*$')
for i, line in enumerate(file, start= index):
if list(filter(r.match, line)) or "Sales Tax" in line:
return i
I want my program to start searching from line index and to return the line where I find the string I am looking for.
I don't think you can start at a specific line of a file. I think you have to skip all the preceding lines first:
def endingIndexOfTable(file, index):
r = re.compile('^V.*(.).*(.).*(.).*(-).*(-).*(.).*(.).*(:).*$')
for i, line in enumerate(file):
if i >= index:
if list(filter(r.match, line)) or "Sales Tax" in line:
return i
Although, did you mean return line?
Then, the version with islice should be like this:
from itertools import islice
def endingIndexOfTable(file, index):
r = re.compile('^V.*(.).*(.).*(.).*(-).*(-).*(.).*(.).*(:).*$')
for i, line in islice(enumerate(file), index, None):
if list(filter(r.match, line)) or "Sales Tax" in line:
return i
(again assuming that both the regex and the return are correct)
EDIT
I screwed up in the same way as OP. This does not answer the question.
I don't want to deal with your regex, but here's one way to achieve the logic you need for searching from a specific line. It would load the entire file in memory though, and not actually read just the specific line.
poem.txt is just the file I used to test. Contents:
Author of the poem is: Me
poem is called: Test
AAFgz
S2zergtrxbhcn
Dzrgxt
Frhgc
Gzxcnhvjzx
xghrfcan a
jvzxhdyrfcv
kh
def read_by_line(file, index):
for i, line in enumerate(file.readlines(), start=index):
print(line)
if "a" in line: # if condition could have been your regex stuff
return i
with open('poem.txt', 'r') as file_object:
print(read_by_line(file_object, 5))

Want to remove \n from len [duplicate]

This question already has answers here:
How to read a file line-by-line into a list?
(28 answers)
Closed 1 year ago.
here is my task:
You are provided a books.txt file, which includes the book titles, each one written on a separate line.
Read the title one by one and output the code for each book on a separate line.
For example, if the books.txt file contains:
Some book
Another book
Your program should output:
S9
A12
file = open("/usercode/files/books.txt", "r")
with file as f:
lines = f.readlines()
for i in lines:
count = len(i)
count = str(count - 1)
print(i[0]+count)
file.close()
and this outputs everything correct but the last line because i[#lastLine] is done after the last count if that makes any sense?(could be completely wrong I am learning)
Basically I want to know where I am going wrong in my code. I believe it is the way I structured the for i in lines part and should have handled the \n in a different way to
count = len(i)
count = str(count - 1)
ANSWER
Thank you for informing me, adding i = i.strip() strips new lines aka \n which sorted the problem!
Working code:
file = open("/usercode/files/books.txt", "r")
with file as f:
lines = f.readlines()
for i in lines:
i = i.strip('\n') #Strips new lines aka \n
count = str(len(i))
print(i[0]+count)
file.close()
You can use strip() on i to remove the new lines.
strip - Returns a copy of the string with the leading and trailing characters removed.
Python docs strip
you can cast your count as a str() to print.
str()
Returns a string containing a printable representation of an object.
Python docs str
As comments point out, suggest also changing i to line for readability.
file = open("books.txt", "r")
with file as f:
lines = f.readlines()
for line in lines:
count = str(len(line.strip()))
print(line[0]+count)
file.close()

Inconsistent return values when using regex functions [duplicate]

This question already has answers here:
Why can't I call read() twice on an open file?
(7 answers)
Python : The second for loop is not running
(1 answer)
Closed 4 years ago.
My code is behaving strangely, and I have a feeling it has to do with the regular expressions i'm using.
I'm trying to determine the number of total words, number of unique words, and number of sentences in a text file.
Here is my code:
import sys
import re
file = open('sample.txt', 'r')
def word_count(file):
words = []
reg_ex = r"[A-Za-z0-9']+"
p = re.compile(reg_ex)
for l in file:
for i in p.findall(l):
words.append(i)
return len(words), len(set(words))
def sentence_count(file):
sentences = []
reg_ex = r'[a-zA-Z0-9][.!?]'
p = re.compile(reg_ex)
for l in file:
for i in p.findall(l):
sentences.append(i)
return sentences, len(sentences)
sentence, sentence_count = sentence_count(file)
word_count, unique_word_count = word_count(file)
print('Total word count: {}\n'.format(word_count) +
'Unique words: {}\n'.format(unique_word_count) +
'Sentences: {}'.format(sentence_count))
The output is the following:
Total word count: 0
Unique words: 0
Sentences: 5
What is really strange is that if I comment out the sentence_count() function, the word_count() function starts working and outputs the correct numbers.
Why is this inconsistency happening? If I comment out either function, one will output the correct value while the other will output 0's. Can someone help me such that both functions work?
The issue is that you can only iterate over an open file once. You need to either reopen or rewind the file to iterate over it again.
For example:
with open('sample.txt', 'r') as f:
sentence, sentence_count = sentence_count(f)
with open('sample.txt', 'r') as f:
word_count, unique_word_count = word_count(f)
Alternatively, f.seek(0) would rewind the file.
Make sure to open and close your file properly. One way you can do this is by saving all the text first.
with open('sample.txt', 'r') as f:
file = f.read()
The with statement can be used to open and safely close the file handle. Since you would have extracted all the contents into file, you don't need the file open anymore.

Delete line from text file if line contains one of few specified strings Python [duplicate]

This question already has answers here:
How to check if a string contains an element from a list in Python
(8 answers)
Closed 5 years ago.
I presently have code that deletes all lines from a text file that contain one specific string. Here it is:
import os
with open(r"oldfile") as f, open(r"workfile", "w") as working:
for line in f:
if "string1" not in line:
working.write(line)
os.remove(r"oldfile")
os.rename(r"workfile", r"oldfile")
My question is: how can I include other strings? In other words, I want to tell the script that if a line contains "string1" or some other string "string2", then delete that line. I know I could just repeat the code I put above for every such string, but I'm certain there's some shorter and more efficient way to write that.
Many thanks in advance!
Just abstract it out into a function and use that?
def should_remove_line(line, stop_words):
return any([word in line for word in stop_words])
stop_words = ["string1", "string2"]
with open(r"oldfile") as f, open(r"workfile", "w") as working:
for line in f:
if not should_remove_line(line, stop_words):
working.write(line)
might be good to have a function
def contains(list_of_strings_to_check,line):
for string in list_of_strings_to_check:
if string in line:
return False
return True
list_of_strings = ["string1","string2",...]
...
for line in f:
if contains(list_of_strings,line):
You can loop through a list of your blacklisted strings while keeping track of if one of the blacklisted strings was present like this:
import os
blacklist = ["string1", "string2"]
with open(r"oldfile") as f, open(r"workfile", "w") as working:
for line in f:
write = True
for string in blacklist:
if string in line:
write = False
break
if write:
working.write(line)
os.remove(r"oldfile")
os.rename(r"workfile", r"oldfile")
if "string1" in line or "string2" in line:
This should work I think

How to while loop until the end of a file in Python without checking for empty line? [duplicate]

This question already has answers here:
How to read a large file - line by line?
(11 answers)
Closed 7 months ago.
I'm writing an assignment to count the number of vowels in a file, currently in my class we have only been using code like this to check for the end of a file:
vowel=0
f=open("filename.txt","r",encoding="utf-8" )
line=f.readline().strip()
while line!="":
for j in range (len(line)):
if line[j].isvowel():
vowel+=1
line=f.readline().strip()
But this time for our assignment the input file given by our professor is an entire essay, so there are several blank lines throughout the text to separate paragraphs and whatnot, meaning my current code would only count until the first blank line.
Is there any way to check if my file has reached its end other than checking for if the line is blank? Preferably in a similar fashion that I have my code in currently, where it checks for something every single iteration of the while loop
Thanks in advance
Don't loop through a file this way. Instead use a for loop.
for line in f:
vowel += sum(ch.isvowel() for ch in line)
In fact your whole program is just:
VOWELS = {'A','E','I','O','U','a','e','i','o','u'}
# I'm assuming this is what isvowel checks, unless you're doing something
# fancy to check if 'y' is a vowel
with open('filename.txt') as f:
vowel = sum(ch in VOWELS for line in f for ch in line.strip())
That said, if you really want to keep using a while loop for some misguided reason:
while True:
line = f.readline().strip()
if line == '':
# either end of file or just a blank line.....
# we'll assume EOF, because we don't have a choice with the while loop!
break
Find end position of file:
f = open("file.txt","r")
f.seek(0,2) #Jumps to the end
f.tell() #Give you the end location (characters from start)
f.seek(0) #Jump to the beginning of the file again
Then you can to:
if line == '' and f.tell() == endLocation:
break
import io
f = io.open('testfile.txt', 'r')
line = f.readline()
while line != '':
print line
line = f.readline()
f.close()
I discovered while following the above suggestions that
for line in f:
does not work for a pandas dataframe (not that anyone said it would)
because the end of file in a dataframe is the last column, not the last row.
for example if you have a data frame with 3 fields (columns) and 9 records (rows), the for loop will stop after the 3rd iteration, not after the 9th iteration.
Teresa

Categories

Resources