Checking data in a file for duplicates (Python)

Checking data in a file for duplicates (Python) - python

I am trying to make a list of topics for another project to use and I am storing the topics in Topics.txt. However, when the topics are stored in the file, I do not want duplicate topics. So when I am saving my topics to my Topics.txt file, I also save them to a Duplicates.txt file. What I want to do is create a conditional statement that won't add topics to Topics.txt if the topics are in the Duplicates.txt. My problem is, I don't know how I could create a conditional statement that could check if the topic is listed in Duplicates.txt. A problem may arise if you scan for keywords such as "music", seeing that "electro-music" contains the word "music".
Entry = input("Enter topic: ")
Topic = Entry + "\n"
Readfilename = "Duplicates.txt"
Readfile = open(Readfilename, "r")
Readdata = Readfile.read()
Readfile.close()
if Topic not in Duplicates:
Filename = "Topics.txt"
File = open(Filename, "a")
File.append(Topic)
File.close()
Duplicate = Topic + "\n"
Readfile = open(Readfilename, "a")
Readfile.append(Topic)
Readfile.close()

You can read a file line by line which would result in a solution like this one
Entry = input("Enter topic: ")
Topic = Entry + "\n"
Readfilename = "Duplicates.txt"
found=False
with open(Readfilename, "r") as Readfile:
for line in Readfile:
if Topic==line:
found=True
break # no need to read more of the file
if not found:
Filename = "Topics.txt"
with open(Filename, "a") as File:
File.write(Topic)
with open(Readfilename, "a") as Readfile:
Readfile.write(Topic)

You can store your topics in a set. A set is collection of unique items.
topics = {'Banjo', 'Guitar', 'Piano'}
You can check for membership using:
>>> 'Banjo' in topics
True
You add new things to a set via .add()
topics.add('Iceskating')
>>> topics
set(['Banjo','Guitar', 'Piano', 'Iceskating'])
Python 3 Docs on sets here. The tutorial page on sets is here.

Related

Creating a search function in a list from a text file

everyone. I have a Python assignment that requires me to do the following:
Download this CSV fileLinks to an external site of female Oscar winners (https://docs.google.com/document/d/1Bq2T4m7FhWVXEJlD_UGti0zrIaoRCxDfRBVPOZq89bI/edit?usp=sharing) and open it into a text editor on your computer
Add a text file to your sandbox project named OscarWinnersFemales.txt
Copy and paste several lines from the original file into your sandbox file. Make sure that you include the header.
Write a Python program that does the following:
Open the file and store the file object in a variable
Read the entire contents line by line into a list and strip away the newline character at the end of each line
Using list slicing, print lines 4 through 7 of your file
Write code that will ask the user for an actress name and then search the list to see if it is in there. If it is it will display the record and if it is not it will display Sorry not found.
Close the file
Below is the code I currently have. I've already completed the first three bullet points but I can't figure out how to implement a search function into the list. Could anyone help clarify it for me? Thanks.
f = open('OscarsWinnersFemales.txt')
f = ([x.strip("\n") for x in f.readlines()])
print(f[3:7])
Here's what I tried already but it just keeps returning failure:
def search_func():
actress = input("Enter an actress name: ")
for x in f:
if actress in f:
print("success")
else:
print("failure")
search_func()

I hate it when people use complicated commands like ([x.strip("\n") for x in f.readlines()]) so ill just use multiple lines but you can do what you like.
f = open("OscarWinnersFemales.txt")
f = f.readlines()
f.close()
data = {} # will list the actors and the data as their values
for i, d in enumerate(data):
f[i] = d.strip("\n")
try:
index, year, age, name, movie = d.split(",")
except ValueError:
index, year, age, name, movie, movie2 = d.split(",")
movie += " and " + movie2
data[name] = f"{index}-> {year}-{age} | {movie}"
print(f[3:7])
def search_actr(name):
if name in data: print(data[name])
else: print("Actress does not exist in database. Remember to use captols and their full name")
I apologize if there are any errors, I decided not to download the file but everything I wrote is based off my knowledge and testing.

I have figured it out
file = open("OscarWinnersFemales.txt","r")
OscarWinnersFemales_List = []
for line in file:
stripped_line = line.strip()
OscarWinnersFemales_List.append(stripped_line)
file.close()
print(OscarWinnersFemales_List[3:7])
print()
actress_line = 0
name = input("Enter An Actress's Name: ")
for line in OscarWinnersFemales_List:
if name in line:
actress_line = line
break
if actress_line == 0:
print("Sorry, not found.")
else:
print()
print(actress_line)

So I just created a text file and copied text from an already existing file using a with loop. How do I open newly created file in same prog?

I'm making a program that takes text from an input file, then you input a file where it copies the already existing file text. Then, I need to replace a few words there and print the count of how many of these words were replaced. This is my code so far, but since with loops close the newly created file, I have no idea how to open it back again for reading and writing and counting. This is my awful code so far:
filename=input("Sisesta tekstifaili nimi: ")
inputFile=open(filename, "r")
b=input("Sisesta uue tekstifaili nimi: ")
uusFail=open(b+".txt", "w+")
f=uusFail
with inputFile as input:
with uusFail as output:
for line in input:
output.write(line)
lines[]
asendus = {'hello':'tere', 'Hello':'Tere'}
with uusFail as infile
for line in infile
for src, target in asendus
line = line, replace(src, target)
lines.append(line)
with uusFail as outfile:
for line in lines:
outfile.write(line)

There are a lot of unnecessary loops in your code. when you read the file, you can treat it as a whole and count the number of occurrences and replace them. Here is a modified version of your code:
infile = input('Enter file name: ')
outfile = input('enter out file: ')
with open(infile) as f:
content = f.read()
asendus = {'hello':'tere', 'Hello':'Tere'}
my_count = 0
for src, target in asendus.items():
my_count += content.count(src)
content = content.replace(src, target)
with open(f'{outfile}.txt','w+' ) as f:
f.write(content)

You need to reopen the file in the second block of code:
with open(b+".txt", "r") as infile:

Cannot read data from textfile [duplicate]

This question already has answers here:
How to read a file line-by-line into a list?
(28 answers)
Closed 2 years ago.
the idea is to write a python code reading data from a text file indicated below.
https://www.minorplanetcenter.net/iau/MPCORB/CometEls.txt
Further on, I would like to filter comet-data like name magnitude etc. etc.
Right now my problem is getting data output.
My code is:
import os.path
word_dict = {}
scriptpath = os.path.dirname(__file__)
filename = os.path.join(scriptpath, 'CometEls.txt','r')
for line in filename:
line = line.strip()
relation = line.split(' ')
word_dict[relation[0]] = relation[0:20]
while True:
word = input('Comet name : ')
if word in word_dict:
print ('Comets in list :' , word_dict[word])
print(filename) #show file location
else:
print( 'No comet data!')
print(word_dict) #show data from dictionary
As you can see data in my dictionary isn't the comet-data.
It should be
Typing in "a"
Theoretically the code works, the problem is creating the dictionary?
Maybe I'm completely wrong, but it doesn't work neither with tuples or lists, or it's better to copy data into a .csv file?
Best regards

You saved the file path to the filename variable, but did not open the file for reading. You should open file, and then read it:
import os.path
word_dict = {}
scriptpath = os.path.dirname(__file__)
file_path = os.path.join(scriptpath, 'CometEls.txt')
with open(file_path, 'r') as commet_file:
for line in commet_file:
line = line.strip()
relation = line.split(' ')
word_dict[relation[0]] = relation[0:20]
while True:
word = input('Comet name : ')
if word in word_dict:
print ('Comets in list :' , word_dict[word])
print(file_path)
else:
print('No comet data!')
print(word_dict) #show data from dictionary
Also, your example has wrong margins near in while block, please check it.

How to automatically update a list of filenames in Python

I have a list of filenames: files = ["untitled.txt", "example.txt", "alphabet.txt"]
I also have a function to create a new file:
def create_file(file):
"""Creates a new file."""
with open(file, 'w') as nf:
is_first_line = True
while True:
line = input("Line? (Type 'q' to quit.) ")
if line == "q":
# Detects if the user wants to quuit.
time.sleep(5)
sys.exit()
else:
line = line + "\n"
if is_first_line == False:
nf.write(line)
else:
nf.write(line)
is_first_line = False
I want the list to update itself after the file is created. However, if I just filenames.append() it,
I realized that it would only update itself for the duration of the program. Does anybody know how to do this? Is this possible in Python?

"Is this possible in Python?" -> This has nothing to do with limitations of the language you chose to solve your problem. What you want here is persistence. You could just store the list of files in a text file. Instead of hardcoding the list in your code your program would then read the content every time it is run.
This code could get you started:
with open("files.txt") as infile:
files = [f.strip() for f in infile.readlines()]
print(f"files: {files}")
# here do some stuff and create file 'new_file'
new_file = 'a_new_file.txt'
files.append(new_file)
###
with open("files.txt", "w") as outfile:
outfile.write("\n".join(files))

Use of For loop in processing directory contents in Python

I am attempting to loop through a series of text files in a directory, looking for occurences of certain types of words, and prefixing each found word with a user defined tag. My code is as follows.
ACC_Tagged_Test = 'C:/ACC_Tag_Test'
for filename in glob.glob(os.path.join(ACC_Tagged_Test, '*.txt')):
with open(filename) as f:
data = f.read()
data = data.lower()
modals = {"could":1, "would":1, "should":1, "can":1, "may":1, "might":1}
personal_attribute = {"believes":1, "guess":1, "surmise":1, "considers":1,
"presume":1, "speculate":1, "postulate":1, "surmised":1, "assume":1}
approx_adapt = {"broadly":1, "mainly":1, "mostly":1, "loosely":1,
"generally":1, "usually":1,"typically":1, "regularly":1, "widely":1}
plaus_shields = {"wonder":1, "suspect":1, "theorize":1, "hypothesize":1,
"cogitate":1, "contemplate":1, "deliberate":1}
format_modal = "<555>{} ".format
format_attribute = "<666>{} ".format
format_app_adaptor = "<777>{} ".format
format_plaus_shield = "<888>{} ".format
data = " ".join(format_modal(word) if word in modals else word for word in data.split())
data = " ".join(format_attribute(word) if word in personal_attribute else word for word in data.split())
data = " ".join(format_app_adaptor(word) if word in approx_adapt else word for word in data.split())
data = " ".join(format_plaus_shield(word) if word in plaus_shields else word for word in data.split())
with open (filename, "w") as f:
f.write(str(data))
print(data) # This is just added in order to check on screen all files
# Are being processed.
My problem is that although code works on the last file in the directory it is not working on the previous files (1 out of 10 in this) I've tried a second For loop above the file write out statements but that is not working at all. Can anyone explain what I'm doing wrong here?
regards

My speculation is your code is only showing the last file because it's
not indented properly to have all relevant code within the for loop.
Try with this indentation:
ACC_Tagged_Test = 'C:/ACC_Tag_Test'
for filename in glob.glob(os.path.join(ACC_Tagged_Test, '*.txt')):
with open(filename) as f:
data = f.read()
data = data.lower()
modals = {"could":1, "would":1, "should":1, "can":1, "may":1, "might":1}
personal_attribute = {"believes":1, "guess":1, "surmise":1, "considers":1,
"presume":1, "speculate":1, "postulate":1, "surmised":1, "assume":1}
approx_adapt = {"broadly":1, "mainly":1, "mostly":1, "loosely":1,
"generally":1, "usually":1,"typically":1, "regularly":1, "widely":1}
plaus_shields = {"wonder":1, "suspect":1, "theorize":1, "hypothesize":1,
"cogitate":1, "contemplate":1, "deliberate":1}
format_modal = "<555>{} ".format
format_attribute = "<666>{} ".format
format_app_adaptor = "<777>{} ".format
format_plaus_shield = "<888>{} ".format
data = " ".join(format_modal(word) if word in modals else word for word in data.split())
data = " ".join(format_attribute(word) if word in personal_attribute else word for word in data.split())
data = " ".join(format_app_adaptor(word) if word in approx_adapt else word for word in data.split())
data = " ".join(format_plaus_shield(word) if word in plaus_shields else word for word in data.split())
with open (filename, "w") as f:
f.write(str(data))
print(data) # This is just added in order to check on screen all files
# Are being processed.

Assuming all of your code is supposed to be in your for loop. You are overriding your text file, therefore it looks like only your last run is working:
#this overrides the file
with open(filename, "w") as fh:
fh.write(str(data))
change to:
#this append to the file
with open(filename, "a") as fh:
fh.write(str(data))
This will append to your text file and will not override previous added data with the data from the last loop.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Checking data in a file for duplicates (Python) - python

Related

Creating a search function in a list from a text file

So I just created a text file and copied text from an already existing file using a with loop. How do I open newly created file in same prog?

Cannot read data from textfile [duplicate]

How to automatically update a list of filenames in Python

Use of For loop in processing directory contents in Python

Categories

Resources