Multiple Search Terms and Output Files

Multiple Search Terms and Output Files - python

I have a file I'm searching through and printing the results to a file. I'm wondering if it's possible to amend the code below to read in multiple search terms i.e. print any line that has "test" or "hello" (user will specify) in it but have Python create a new output file for each search term?
i.e. Ofile1 would hold all lines containing "test"
Ofile2 would hold all lines containing "hello" and so on
f = open('file3.txt') #input file
term = raw_input("What would you like to search for?") #takes an input to be searched
for line in f:
if term in line:
print (line) #print line of matched term
f.close() #close file
Is this possible to do?

Based of #new user code (improved some error) you could do something like this:
terms = raw_input("What would you like to search for?")
terms = terms.split(" ")
for term in terms:
f = open('text.txt')
filename = '{}_output.txt'.format(term)
o = open(filename, 'w')
for line in f:
if term in line:
o.write(line)
o.close()
f.close()
Maybe you can think that is better open the file once and check some term per each line. Depending of number of terms it will be more or less efficient, if you want could research about it using really big files to check execution times and learn a bit more.

Split your term by spaces. then use a for loop to loop through all of the terms.
ex:
terms = term.split(" ")
for t in terms:
filename = t +"_output.txt"
o = open(filename,'w')
for line in f:
if t in line:
o.write(line) #print line of matched term
o.close()

Related

Why code not working while replacing the text in file

I have a file called test.txt.
I need to convert one string in the file which matches the dictionary.
test.txt:
abc
asd
ds
{{ PRODUCT CATEGORY }}
fdsavfacxvasdvvc
dfvssfzxvdfvzd
Code is below:
data = {'PRODUCT CATEGORY':'Customer'}
all_files = ['test.txt']
out_files = ['ut.txt']
read_dict = {}
for file in all_files:
with open(file,'r') as read_file:
lines = read_file.readlines()
read_dict[file] = lines
for in_f, out_f in zip(all_files, out_files):
with open(in_f,'r') as read_file:
lines = read_file.readlines()
with open(out_f,'w+') as write_file:
for line in lines:
updated_line = []
for word in line.split():
if word in data:
updated_line.append(data[word])
else:
updated_line.append(word)
write_file.writelines(" ".join(updated_line))
print (" ".join(updated_line))
There is a space at the end and at the beginning PRODUCT CATEGORY
Expected output:
abc
asd
ds
Customer
fdsavfacxvasdvvc
dfvssfzxvdfvzd

Try this
import re
data = {'PRODUCT CATEGORY':'Customer'}
all_files = ['test.txt']
out_files = ['ut.txt']
for in_f, out_f in zip(all_files, out_files):
with open(in_f,'r') as read_file:
text = read_file.read()
for word, replace_with in data.items():
text = re.sub(r'\{+ *'+ word + r' *\}+', replace_with, text)
open(out_f,'w+').write(text)

You are splitting by white space, and you have a white space in "Product category" so it never finds an exact match for the word. You can see this if you add a print(word) after the for word in line.split() line
A way to solve this is by replacing Product category with Product_Category in data and in your test.txt file.
Also, you are missing the new line carry after writting each line to the output file, you should replace:
write_file.writelines(" ".join(updated_line))
with
write_file.writelines(" ".join(updated_line)+"\n")
With both these issues solved you get the desired output.

You can't loop over individual words on the input line because that prevents you from finding a dictionary key which consists of more than one word, like the one you have in your example.
Here is a simple refactoring which prints to standard output just so you can see what you are doing.
data = {'PRODUCT CATEGORY':'Customer'}
all_files = ['test.txt']
for file in all_files:
with open(file,'r') as read_file:
for line in read_file:
for keyword in data:
token = '{{ %s }}' % keyword
if token in line:
line = line.replace(token, data[keyword])
print(line, end='')
The end='' is necessary because line already contains a newline, but print wants to supply one of its own; if you write to a file instead of print, you can avoid this particular quirk. (But often, a better design for reusability is to just print to standard output, and let the caller decide what to do with the output.)
It is unclear why you had a separate read_dict for the lines in the input file or why you read the file twice, so I removed those parts.
Looping over a file a line at a time avoids reading the entire file into memory, so if you don't care what's on the previous or the next lines, this is usually a good idea, and scales to much bigger files (you only need to keep one line in memory at a time -- let's hope that one line is not several gigabytes).
Here is a demo (I took the liberty to use just one space in {{ PRODUCT CATEGORY }} and fix the spelling of "customer"): https://repl.it/repls/DarkredMindlessPackagedsoftware#main.py

Reading a file, making some changes and writing the results back

I have an input file (File A) as shown below:
Start of the program
This is my first program ABCDE
End of the program
I receive the program name 'PYTHON' as input, and I need to replace 'ABCDE' with it. So I read the file to find the word 'program' and then replace the string after it as shown below. I have done that in my program. Then, I would like to write the updated string to the original file without changing lines 1 or 3 - just line 2.
Start of the program
This is my first program PYTHON
End of the program
My code:
fileName1 = open(filePath1, "r")
search = "program"
for line in fileName1:
if search in line:
line = line.split(" ")
update = line[5].replace(line[5], input)
temp = " ".join(line[:5]) + " " + update
fileName1 = open(filePath1, "r+")
fileName1.write(temp)
fileName1.close()
else:
fileName1 = open(filePath1, "w+")
fileName1.write(line)
fileName1.close()
I am sure this can be done in an elegant way, but I got a little confused with reading and writing as I experimented with the above code. The output is not as expected. What is wrong with my code?

You can do this with a simple replace:
file_a.txt
Start of the program`
This is my first program ABCDE`
End of the program`
code:
with open('file_a.txt', 'r') as file_handle:
file_content = file_handle.read()
orig_str = 'ABCDE'
rep_str = 'PYTHON'
result = file_content.replace(orig_str, rep_str)
# print(result)
with open('file_a.txt', 'w') as file_handle:
file_handle.write(result)
Also if just replacing ABCDE is not going to work (it may appear in other parts of file as well), then you can use more specific patterns or even a regular expression to replace it more accurately.
For example, here we just replace ABCDE if it comes after program:
with open('file_a.txt', 'r') as file_handle:
file_content = file_handle.read()
orig_str = 'ABCDE'
rep_str = 'PYTHON'
result = file_content.replace('program {}'.format(orig_str),
'program {}'.format(rep_str))
# print(result)
with open('file_a.txt', 'w') as file_handle:
file_handle.write(result)

Searching for a term in a text file using python

I'm really desperate for some help on this python code please. I need to search for a variable (string), return it and the data present on the same line as the variable data.
I've managed to create a variable and then search for the variable in a text file, however if the data contained in the variable is found in the text file the contents of the whole text file is printed out not the line in which the variable data exists.
This is my code so far, please help:
number = input("Please enter the number of the item that you want to find:")
f = open("file.txt", "r")
lines = f.read()
if lines.find("number"):
print (lines)
else:
f.close
Thank you in advance.

See my changes below:
number = input("Please enter the number of the item that you want to find:")
f = open("file.txt", "r")
lines = f.read()
for line in lines: # check each line instead
if number in line: # if the number you're looking for is present
print(line) # print it

It goes like
lines_containg_number = [line for line in lines if number in line]
What this'll do is give you all the lines in the text file in the form of a list and then you can simply print out the contents of the list...

If you use 'with' loop, you don't have to close file. It will be handled by with. Otherwise you have to use f.close(). Solution:
number = input("Please enter the number of the item that you want to find:")
with open('file.txt', 'r') as f:
for line in f:
if number in line:
print line

How to print a specific line from a file

I am trying to print a specific line from the file "Scores", which is option B. This is my code:
print("Option A: Show all scores\nOption B: Show a record\nOption Q: Quit")
decision = input("Enter A, B, C or Q: ")
myFile = open("Scores.txt", "rt")
if decision == "A":
record = myFile.read()
print(record)
myFile.close()
elif decision == "B" or decision == "b":
playerName = input("Enter a player name to view their scores: ")
record = myFile.read()
answer = record.find(playerName)
for line in answer:
print(line)
elif decision == "Q" or decision == "q":
exit
I went for Option B, then I entered a player name that holds the score of the player, but it throws this error message:
line 12, in <module>
for line in answer():
TypeError: 'int' object is not callable

A few cents from my side :
file = open("file")
lines = file.readlines()
for line in lines:
if playername in line:
print line
file.close()
Hope it works!

find() method returns a positive index if it succeeds, -1 otherwise
You should loop on your content line by line, as follows:
for line in myFile:
if line.find(playerName):
print(line)

A safer way to read the file and find data, so that you will not have OutOfMemory issues when storing the whole file in memory.
playerName = input("Enter a player name to view their scores: ")
with open("Scores.txt", 'r') as f:
for row in f:
if playerName in row:
print row
This way you will be using with that will close the file by itself either when the program ends or Garbage Collection kicks in. This way python will read the file line by line and store only 1 line in memory. So you can use huge files and do not worry about memory issues.
Hope it helps :)

Working with str methods will take more acrobatics. Try the following,
import re
p = re.compile(r"\b{}\b".format(playername)) # keep it ready
# inside option B
for line in myfile: # no need to `.read()` it
match = p.search(line)
if match:
print(line)
break # if there is only one record for playername
See if it works for you.

similar thing here:
Reading specific lines only (Python)
fp = open("file")
for i, line in enumerate(fp):
if line == playername:
print line
fp.close()
I also notice you don't close your file for each decision, should make that happen.

Few python idioms and small optimization
Here are many answer, my sample brings in few python idioms and optimize it a bit:
fname = "Scores.txt"
player_name = "Quido"
with open(fname) as f:
for line in f:
if player_name in line:
print line
break
print "Going on doing following tasks."
The with block will close the open file on exiting the inner block. No need to f.close(), safe
in case of problems to read the file.
for line in f: shows, that iterating over file open in text mode we get one line per iteration.
break after we print the line with the player will effectively stop iterating over lines assuming,
there is only one such line or that we are happy with the very first one. If this is not the case,
removing the break allows printing all lines containing the player name.
As lines returned from text file iterator contain new line, you may prefer to get rid of them. Use
print line.strip() in such case what will remove all blank characters from start and end of the line.
Final print is proving, the program continues after it processes all the lines.
It may happen, that you get no output for name, which appears to be present in the file. In such a
case, you might need to clarify letter case. For example, if your text file contains all the names
in exact casing, you have to enter the name properly.
Other option is to lower-case the player_name and compare it against lower cased line:
fname = "Scores.txt"
player_name = "Quido"
normalized_player_name = player_name.lower()
with open(fname) as f:
for line in f:
if normalized_player_name in line.lower():
print line.strip()
break # comment out to print all lines with the player
print "Going on doing following tasks."
Note, that we normalize the player_name outside from the loop to be a bit faster. Lower-casing inside the
loop would work, but would do the same task repeatedly.
The line is printed using exact letter cases as in the file.

Find and write certain words in lines to a file in python

I have a .txt file in cyrillic. It's structure is like that but in cyrillic:
city text text text.#1#N
river, text text.#3#Name (Name1, Name2, Name3)
lake text text text.#5#N (Name1)
mountain text text.#23#Na
What I need:
1) look at the first word in a line
2) if it is "river" then write all words after "#3#", i.e. Name (Name1, Name2, Name3) in a file 'river'.
That I have to do also with another first words in lines, i. e. city, lake, mountain.
What I have done only finds if the first word is "city" and saves whole line to a file:
lines = f.readlines()
for line in lines:
if line.startswith('city'):
f2.write(line)
f.close()
f2.close()
I know I can use regex to find Names: #[0-9]+#(\W+) but I don't know how to implement it to a code.
I really need your help! And I'm glad for any help.

If all of your river**s have ,s after them, like in the above code you posted, I would do something like:
for line in f.readlines():
items = line.split("**,")
if items[0] == "**river":
names = line.split("#")[1].strip().split("(")[1].split(")")[0].split(",")
names = [Name1, Name2, Name3]
#.. now write each one

What you want to do here is avoid hard-coding the names of the files you need. Instead, glean that from the input file. Create a dictionary of the files you need to writing to, opening each one as it's needed. Something like this (untested and probably in need of some adaptation):
outfiles = {}
try:
with open("infile.txt") as infile:
for line in infile:
tag = line.split(" ", 1)[0].strip("*, ") # e.g. "river"
if tag not in outfiles: # if it's the first time we've seen a tag
outfiles[tag] = open(tag = ".txt", "w") # open tag.txt to write
content = line.rsplit("#", 1)[-1].strip("* ")
outfiles[tag].write(content + "\n")
finally:
for outfile in outfiles.itervalues():
outfile.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.