Python: how to skip the repeated lines in an input file?

Python: how to skip the repeated lines in an input file? - python

I am reading from a file and I want to read each line alone, since each 3rd line in the output has to be a combination of the previous 2 lines. This is a small example:
Input:
<www.example.com/apple> <Anything>
<www.example.com/banana> <Anything>
Output:
<www.example.com/apple> <Anything>
<www.example.com/banana> <Anything>
<Apple> <Banana>
If any of the lines is repeated or if it is an empty line, then I do not want to process it, I want to get only 2 different lines each time.
This is a part of my real input:
<http://catalog.data.gov/bread> <http://dbpedia.org>
<http://catalog.data.gov/bread> <http://dbpedia.org>
<http://catalog.data.gov/bread> <http://dbpedia.org>
<http://catalog.data.gov/bread> <http://dbpedia.org>
<http://catalog.data.gov/roll> <http://dbpedia.org>
<http://catalog.data.gov/roll> <http://dbpedia.org>
In this case I want the output to be like this:
<http://catalog.data.gov/bread> <http://dbpedia.org>
<http://catalog.data.gov/roll> <http://dbpedia.org>
<bread> <roll>
This is my code:
file = open('rdfs.txt')
for id, line in enumerate(file):
if id % 2 == 0:
if line.isspace():
continue
line1 = line.split()
sub_line1, rel_line1 = line1[0], line1[1]
sub_line1 = sub_line1.lstrip("<").rstrip(">")
print(sub_line1)
else:
if line.isspace():
continue
line2 = line.split()
sub_line2, rel_line2 = line2[0], line2[1]
sub_line2 = sub_line2.lstrip("<").rstrip(">")
print(sub_line2)
It is working perfectly, but I am getting all the lines, how to add if the second line is equal to the line before then skip all the lines until you find a new different line.
The output I am getting now:
http://catalog.data.gov/bread
http://catalog.data.gov/bread
http://catalog.data.gov/roll
http://catalog.data.gov/roll
Thanks!!

You can declare a set() and named it line_seen that will hold all seen lines, and check on every new line if it in the lines_seen or not and add it to your check:
Your code should looks like:
file = open('rdfs.txt')
lines_seen = set() # holds lines already seen
for id, line in enumerate(file):
if line not in lines_seen: # not a duplicate
lines_seen.add(line)
if id % 2 == 0:
if line.isspace():
continue
line1 = line.split()
sub_line1, rel_line1 = line1[0], line1[1]
sub_line1 = sub_line1.lstrip("<").rstrip(">")
print(sub_line1)
else:
if line.isspace():
continue
line2 = line.split()
sub_line2, rel_line2 = line2[0], line2[1]
sub_line2 = sub_line2.lstrip("<").rstrip(">")
print(sub_line2)

Related

How to delete from specific line until specific line in a file on python

I have a file like this:
a
D
c
a
T
c
a
R
c
I want to delete from specific line (in this case 3) until another specific line (in this case 5), so the file would look like:
a
D
c
a
R
c

I think this should work:
def delete_line_range(filename, start_line, end_line):
# read all lines
with open(filename, 'r') as f:
lines = f.readlines()
f.close()
with open(filename, 'w') as f:
# iterate trough lines
for i in range(len(lines)):
# check if line is out of range
if i < start_line or i > end_line:
f.write(lines[i])
f.close()
return
Or if you want to delete multiple ranges
def delete_line_ranges(filename, line_ranges):
# read all lines
with open(filename, 'r') as f:
lines = f.readlines()
f.close()
with open(filename, 'w') as f:
# iterate trough lines
for i in range(len(lines)):
# check if line is in range
in_range = False
for line_range in line_ranges:
if i >= line_range[0] and i <= line_range[1]:
in_range = True
break
# if not in range, write line
if not in_range:
f.write(lines[i])
f.close()
return
Where line_ranges is a list of tuples containing start and end line numbers.
Be aware that in both of these functions the line numbers start at 0.
Means if you want to delete line 3 to 5 like in your example you need to subtract 1 from both start and end line number.
delete_line_range('test.txt', 2, 4) # deletes line 3 to 5 if you start counting from 1
Edit
Deleting Contacts out of vcf file.
Get range of specific contact:
def get_contact_range(filename, name):
# read all lines
with open(filename, 'r') as f:
lines = f.readlines()
f.close()
for i in range(len(lines)):
# check if line is start of contact
if lines[i].startswith('BEGIN:VCARD'):
start_line = i
continue
if name in lines[i]:
for j in range(i, len(lines)):
# check if line is end of contact
if lines[j].startswith('END:VCARD'):
end_line = j
return (start_line, end_line)
return None
print(get_contact_range('test.vcf', 'John Doe'))
test.vcf
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//macOS 11.5.2//EN
N:Doe;John;;;
FN:John Doe
ORG:Sharpened Productions;
EMAIL;type=INTERNET;type=HOME;type=pref:johndoe#email.com
EMAIL;type=INTERNET;type=WORK:johndoe#workemail.com
TEL;type=CELL;type=VOICE;type=pref:123-456-7890
ADR;type=HOME;type=pref:;;12345 First Avenue;Hometown;NY;12345;United States
ADR;type=WORK:;;67890 Second Avenue;Businesstown;NY;67890;United States
NOTE:The man I met at the company networking event. He mentioned that he had some potential leads.
item1.URL;type=pref:https://fileinfo.com/
item1.X-ABLabel:_$!!$_
BDAY:2000-01-01
END:VCARD
Output:
(0, 15)
Combine the above functions:
def delete_contact(filename, name):
# get start and end line of contact
contact_range = get_contact_range(filename, name)
# delete contact
delete_line_range(filename, contact_range[0], contact_range[1])
return
Multiple Contacts at once:
def delete_contacts(filename, names):
# get start and end line of contacts
line_ranges = []
for name in names:
contact_range = get_contact_range(filename, name)
line_ranges.append((contact_range[0], contact_range[1]))
# delete contacts
delete_line_ranges(filename, line_ranges)
return

Deleting a specific word form a text file in python

I used this code to delete a word from a text file.
f = open('./test.txt','r')
a = ['word1','word2','word3']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
f = open('./test.txt','w')
for line in lst:
f.write(line)
f.close()
But for some reason if the words have the same characters, all those characters get deleted. So for e.g
in my code:
def cancel():
global refID
f1=open("refID.txt","r")
line=f1.readline()
flag = 0
while flag==0:
refID=input("Enter the reference ID or type 'q' to quit: ")
for i in line.split(','):
if refID == i:
flag=1
if flag ==1:
print("reference ID found")
cancelsub()
elif (len(refID))<1:
print("Reference ID not found, please re-enter your reference ID\n")
cancel()
elif refID=="q":
flag=1
else:
print("reference ID not found\n")
menu()
def cancelsub():
global refIDarr, index
refIDarr=[]
index=0
f = open('flightbooking.csv')
csv_f = csv.reader(f)
for row in csv_f:
refIDarr.append(row[1])
for i in range (len(refIDarr)):
if refID==refIDarr[i]:
index=i
print(index)
while True:
proceed=input("You are about to cancel your flight booking, are you sure you would like to proceed? y/n?: ")
while proceed>"y" or proceed<"n" or (proceed>"n" and proceed<"y") :
proceed=input("Invalid entry. \nPlease enter y or n: ")
if proceed=="y":
Continue()
break
elif proceed=="n":
main_menu
break
exit
break
def Continue():
lines = list()
with open('flightbooking.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field ==refID:
lines.remove(row)
break
with open('flightbooking.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
f = open('refID.txt','r')
a=refIDarr[index]
print(a)
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
print(lst)
f.close()
f = open('refID.txt','w')
for line in lst:
f.write(line)
f.close()
print("Booking successfully cancelled")
menu()
When the code is run, the refID variable has one word stored in it, and it should replace just that word with a blank space, but it takes that word for e.g 'AB123', finds all other words which might have an 'A' or a 'B' or the numbers, and replace all of them. How do I make it so it only deletes the word?
Text file before running code:
AD123,AB123
Expected Output in the text file:
AD123,
Output in text file:
D,
Edit: I have added the entire code, and maybe you can help now after seeing that the array is being appended to and then being used to delete from a text file.

here's my opinion.
refIDarr = ["AB123"]
a = refIDarr[0] => a = "AB123"
strings in python are iterable, so when you do for word in a, you're getting 5 loops where each word is actually a letter.
Something like the following is being executed.
if "A" in line:
line = line.replace("A","")
if "B" in line:
line = line.replace("B","")
if "1" in line:
line = line.replace("1","")
if "2" in line:
line = line.replace("2","")
if "3" in line:
line = line.replace("3","")
they correct way to do this is loop over refIDarr
for word in refIDarr:
line = line.replace(word,'')
NOTE: You don't need the if statement, since if the word is not in the line it will return the same line as it was.
"abc".replace("bananan", "") => "abc"
Here's a working example:
refIDarr = ["hello", "world", "lol"]
with open('mytext.txt', "r") as f:
data = f.readlines()
for word in refIDarr:
data = [line.replace(word, "") for line in data]
with open("mytext.txt", "w") as newf:
newf.writelines(data)

The problem is here:
a=refIDarr[index]
If refIDarr is a list of words, accessing specific index makes a be a word. Later, when you iterate over a (for word in a:), word becomes a letter and not a word as you expect, which causes eventually replacing characters of word instead the word itself in your file.
To avoid that, remove a=refIDarr[index] and change your loop to be:
for line in f:
for word in refIDarr:
if word in line:
line = line.replace(word,'')

Trouble with matching variables to line in txt, and removing line

I am having trouble with matching variables to lines in txt, and removing the lines.
I am currently doing a hotel room booking program in which I am having trouble removing a booking from my text file.
This is how my lines in my text file are formatted:
first_name1, phonenumber1 and email 1 are linked to entry boxes
jeff;jeff#gmail.com;123123123;2019-06-09;2019-06-10;Single Room
def edit_details(self,controller):
f = open("Bookings.txt")
lines = f.readlines()
f.close()
x = -1
for i in lines:
x += 1
data = lines[x]
first_name1 = str(controller.editName.get())
phonenumber1 = str(controller.editPhone.get())
email1 = str(controller.editEmail.get())
checkfirst_name, checkemail, checkphone_num, checkclock_in_date, checkclock_out_date, checkroom = map(str, data.split(";"))
if checkfirst_name.upper() == first_name1.upper() and checkemail.upper() == email1.upper() and checkphone_num == phonenumber1:
controller.roomName.set(checkfirst_name)
controller.roomEmail.set(checkemail)
controller.roomPhone.set(checkphone_num)
controller.roomCheckin.set(checkclock_in_date)
controller.roomCheckout.set(checkclock_out_date)
controller.roomSelect.set(checkroom)
print(controller.roomName.get())
print(controller.roomSelect.get())
controller.show_frame("cancelBooking")
break
elif x > len(lines) - int(2):
messagebox.showerror("Error", "Please Enter Valid Details")
break
I have the user to enter their details to give me the variables but I don't know how to match these variables to the line in the text file to remove the booking.
Do I have to format these variables to match the line?
This is what i have tried but it deletes the last line in my file
line_to_match = ';'.join([controller.roomName.get(),controller.roomEmail.get(),controller.roomPhone.get()])
print(line_to_match)
with open("Bookings.txt", "r+") as f:
line = f.readlines()
f.seek(0)
for i in line:
if i.startswith(line_to_match):
f.write(i)
f.truncate()

I have kind of added a pseudocode here. You can join the variables using ; and validate if the line startswith those details, like below.
first_name1, phonenumber1, email1 = 'jeff', 'jeff#gmail.com', '123123123'
line_to_match = ';'.join([first_name1, email1, phonenumber1])
for i in line:
...
if i.startswith(line_to_match):
# Add your removal code here
...

Replace string in line without adding new line?

I want to replace string in a line which contain patternB, something like this:
from:
some lines
line contain patternA
some lines
line contain patternB
more lines
to:
some lines
line contain patternA
some lines
line contain patternB xx oo
more lines
I have code like this:
inputfile = open("d:\myfile.abc", "r")
outputfile = open("d:\myfile_renew.abc", "w")
obj = "yaya"
dummy = ""
item = []
for line in inputfile:
dummy += line
if line.find("patternA") != -1:
for line in inputfile:
dummy += line
if line.find("patternB") != -1:
item = line.split()
dummy += item[0] + " xx " + item[-1] + "\n"
break
outputfile.write(dummy)
It do not replace the line contain "patternB" as expected, but add an new line below it like :
some lines
line contain patternA
some lines
line contain patternB
line contain patternB xx oo
more lines
What can I do with my code?

Of course it is, since you append line to dummy in the beginning of the for loop and then the modified version again in the "if" statement. Also why check for Pattern A if you treat is as you treat everything else?
inputfile = open("d:\myfile.abc", "r")
outputfile = open("d:\myfile_renew.abc", "w")
obj = "yaya"
dummy = ""
item = []
for line in inputfile:
if line.find("patternB") != -1:
item = line.split()
dummy += item[0] + " xx " + item[-1] + "\n"
else:
dummy += line
outputfile.write(dummy)

The simplest will be:
1. Read all File into string
2. Call string.replace
3. Dump string to file

If you want to keep line by line iterator
(for a big file)
for line in inputfile:
if line.find("patternB") != -1:
dummy = line.replace('patternB', 'patternB xx oo')
outputfile.write(dummy)
else:
outputfile.write(line)
This is slower than other responses, but enables big file processing.

This should work
import os
def replace():
f1 = open("d:\myfile.abc","r")
f2 = open("d:\myfile_renew.abc","w")
ow = raw_input("Enter word you wish to replace:")
nw = raw_input("Enter new word:")
for line in f1:
templ = line.split()
for i in templ:
if i==ow:
f2.write(nw)
else:
f2.write(i)
f2.write('\n')
f1.close()
f2.close()
os.remove("d:\myfile.abc")
os.rename("d:\myfile_renew.abc","d:\myfile.abc")
replace()

You can use str.replace:
s = '''some lines
line contain patternA
some lines
line contain patternB
more lines'''
print(s.replace('patternB', 'patternB xx oo'))

python: list index out of range in reducer

I'm writing the reduce part of my mapreduce program and I am getting a 'list index out of range' in the line SplitLine = [1]. Why is this? I was fairly sure this was correct.
import sys
cKey = ""
cList = []
lines = sys.stdin.readlines()
for line in lines:
line = line.rstrip()
splitLine = line.split("\t")
key = splitLine[0]
value = splitLine[1]
....
Any thoughts? Thank you!

You are trying to access splitLine[1] when there is no [1] entry. Most likely, you have either blank lines or lines that have no \t in it.
A possible solution would be to ignore entries that have less than 2 columns:
import sys
cKey = ""
cList = []
lines = sys.stdin.readlines()
for line in lines:
line = line.rstrip()
splitLine = line.split("\t")
if len(splitLine) > 1:
key = splitLine[0]
value = splitLine[1]

You should do 2 things:
Filter out blank lines at the outset if not re.match(r'^\s*$', line):
For non blank lines add a default value for border cases with no tabs (blank space " " in this case) line+"\t "
Sample code:
import sys
cKey = ""
cList = []
lines = sys.stdin.readlines()
for line in lines:
# line is empty (has only the following: \t\n\r and whitespace)
if not re.match(r'^\s*$', line):
# add extra delimiter '\t' and default value ' ' to be safe
line = line+"\t "
splitLine = line.split("\t")
key = splitLine[0]
# strip any blank spaces at end
value = splitLine[1].rstrip()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to skip the repeated lines in an input file? - python

Related

How to delete from specific line until specific line in a file on python

Deleting a specific word form a text file in python

Trouble with matching variables to line in txt, and removing line

Replace string in line without adding new line?

python: list index out of range in reducer

Categories

Resources