i created a program to compare two text file , and identify duplicate and unique items
but first for loop is running only once after that it exit without iterating second item in the file. if any one can help please do.
f1 = open("file1.txt","r")
f2 = open("file2.txt","r")
duplicate = open("duplicate_ip.txt", "w")
unique = open("unique_ip.txt", "w")
for x in f1:
for y in f2:
if x == y:
duplicate.write(y)
else:
unique.write(x)
file1.txt contains following
192.168.1.1
192.168.10.2
192.168.56.5
192.16.10.2
192.168.5.5
file2.txt contain following
192.168.1.2
10.10.10.0
10.10.10.11
192.168.11.111
127.0.0.1
172.16.31.5
If you want f1 and f2 to be list of strings, then please use readlines() method. Also don't forget to close files (at least the ones you write to).
f1 = open("file1.txt", "r").readlines()
f2 = open("file2.txt", "r").readlines()
duplicate = open("duplicate_ip.txt", "w")
unique = open("unique_ip.txt", "w")
for x in f1:
for y in f2:
if x == y:
duplicate.write(y)
else:
unique.write(x)
f1.close()
f2.close()
duplicate.close()
unique.close()
But there is a much simple way to manage file IO sessions with use of context manager. Your code will then will be looking something like this
with open("file1.txt", "r") as f1, \
open("file2.txt", "r") as f2, \
open("duplicate_ip.txt", "w") as duplicate, \
open("unique_ip.txt", "w") as unique:
f1_lines = f1.readlines()
f2_lines = f2.readlines()
for x in f1_lines:
for y in f2_lines:
if x == y:
duplicate.write(y)
else:
unique.write(x)
Solution with set operation & (intersection) and ^ (XOR operation)
f1_ip = set(open("file1.txt","r"))
f2_ip = set(open("file2.txt","r"))
with open("duplicate_ip.txt", "w") as duplicate:
for ip in f1_ip & f2_ip:
duplicate.write(ip)
with open("unique_ip.txt", "w") as unique:
for ip in f1_ip ^ f2_ip:
unique.write(ip)
you can write like this:
with open("file1.txt", "r") as f1:
data1 = set()
for line in f1.readlines():
data1.add(line.strip('\n'))
with open("file2.txt", "r") as f2:
data2 = set()
for line in f2.readlines():
data2.add(line.strip('\n'))
unique_list = data1.difference(data2)
duplicate_list = data1.intersection(data2)
with open("duplicate_ip.txt", "w") as duplicate:
for ip in duplicate_list:
duplicate.write(ip+ "\n")
with open("unique_ip.txt", "w") as unique:
for ip in unique_list:
unique.write(ip+ "\n")
To answer the question, you're opening the file, but not reading the lines, so you're not actually iterating over the file.
Couple comments:
you should try to open files using with so you don't accidentily leave them open
this still wont work, as the last line won't have a \n so you should probably remove them using .replace
you're only checking one way, any ips in file2 that aren't in file 1 wont be found this way, not sure if that's what you want
there are faster ways to check which items are the same in two lists and which are unique
To solve the main issue:
with open("file1.txt","r") as f1:
data1 = f1.readlines()
with open("file2.txt","r") as f2:
data2 = f2.readlines()
with open("duplicate_ip.txt", "w") as duplicate, \
open("unique_ip.txt", "w") as unique:
for x in data1:
for y in data2:
if x == y:
duplicate.write(y)
else:
unique.write(x)
Related
I had to redo my questions because it made everyone focus on the wrong word
Sorry about this guys but I did put that I have 100 rows with different code names
This is my working code
with open("file1.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "nman" not in line:
f.write(line)
f.truncate()
inside the text file
Before = file1.txt
"nman": "afklafjlka"
"prvr": "521.0.25",
"prvrfi": "1.18.3",
RESULTS = file1.txt
"prvr": "521.0.25",
"prvrfi": "1.18.3",
As you can see in my result the code "nman" was removed the whole row was removed
I made something in batch for this, but it's way to slow
I used in batch this script
findstr /V "\<nman\> \<prvr\>" file1.txt > file2.txt
So my end result for the updated script should be able to read many different code names just like my batch script
with open("file1.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "nman" "prvr" not in line: < --------
f.write(line)
f.truncate()
or something like this
to_delete = ["nman", "prvr"] < ------
with open("file1.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if to_delete not in line: < --------
f.write(line)
f.truncate()
Working Script Thank you
with open("file1.txt", 'r') as f:
lines = f.readlines()
to_delete = ["nman",
"prvr"]
new_lines = []
for l in lines:
for d in to_delete:
if d in l:
l = ""
break
new_lines.append(l)
with open("file2.txt", 'w') as file2:
file2.writelines(new_lines)
with open("file1.txt", 'r') as f:
lines = f.readlines()
to_delete = ["nman", "prvr"]
new_lines = []
for l in lines:
for d in to_delete:
if d in l:
l = ""
break
new_lines.append(l)
with open("file2.txt", 'w') as file2:
file2.writelines(new_lines)
If there's a rule you can delete rows in a single wipe with regular expressions.
This one deletes all the rows that start with "code followed by a number.
from re import sub
with open('file1.txt', 'r') as f:
text = f.read()
with open('file1.txt', 'w') as f:
f.write(sub('("code\d+".*(\n|))','',text))
If there's a rule but you don't want to use RegEx or you don't know how to use them, then you can use a function to check if a row is good or bad
from re import search
code_list = ['1', '2', '3']
with open('file1.txt', 'r') as f:
text = f.readlines()
with open('file1.txt', 'w') as f:
f.writelines([_ for _ in text if not any([search(f'code{i}', _) for i in code_list])])
Or within a range:
from re import search
with open('file1.txt', 'r') as f:
text = f.readlines()
with open('file1.txt', 'w') as f:
f.writelines([_ for _ in text if not any([search(f'code{i}', _) for i in range(100)])])
Given Input File:
"nman": "afklafjlka"
"prvr": "521.0.25",
"prvrfi": "1.18.3",
| acts like a boolean OR in regex
import re
words = ['"nman"', '"prvr"'] # USE THE '""' OR YOU MAY DELETE MORE THAN EXPECTED
words = '|'.join(words) # == "nman"|"prvr"
with open('test.txt', 'r+') as f:
# Marker used by truncate.
f.seek(0)
# Adds lines to the list if a regex match is NOT found.
# Matching the OR statement we created earlier.
lines = [x for x in f.readlines() if not re.match(words, x)]
# Writes all the lines we found to the file.
f.writelines(lines)
# Cuts off everything we didn't add.
f.truncate()
Output:
"prvrfi": "1.18.3",
I have two txt files with a string per line. I want to compare txt file 1 with txt file 2 and generate a new file with all the strings that are in 2 but not in 1.
I tried something rather simple:
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
for word in file2:
if word not in file1:
print(word)
What this code does, it gives me all the strings from file2 if there is ANY string not in 1.
**file1:**
this
is
a
word
**file2:**
this
is
a
totally
different
word
What I would expect is only the string "totally" and "different". But I get all the strings from file2.
open() function returns file object, not string that is inside that file. In order to read text inside you'd need to read after you open file. Also it's a good habit to close files always when you open them. That's why personally I prefer to use command with open('file1.txt', 'r') as file1: to make sure that after I'm done with that file it's always closed and I don't have to run close method explicitly. It would look like this:
with open('file1.txt', 'r', encoding='utf8') as file1:
with open('file2.txt', 'r', encoding='utf8') as file2:
text1 = file1.read()
text2 = file2.read()
words1 = text1.split('\n')
words2 = text2.split('\n')
unique_words = list(filter(lambda w2: w2 not in words1, words2))
for word in unique_words:
print(word)
You can use the .readlines() function which will convert all the lines in the text file to an element in a list or it will generate a list of the those lines.
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
f1=file1.readlines()
f2=file2.readlines()
for word in f2:
if word not in f1:
print(word)
How about this:-
FILES = ['file1.txt', 'file2.txt']
SETS = []
for _file in FILES:
with open(_file) as infile:
_lines = [line.strip() for line in infile.readlines()]
SETS.append(set(_lines))
print(SETS[1] - SETS[0])
Try with this solution:
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
word1 = [x for x in file1]
word2 = [x for x in file2]
words = [x.strip("\n") for x in word2 if x not in word1]
Output:
['totally', 'different']
I wrote x.strip("\n") because my example file contain a word for each row.
How can I make this work to delete values from my txt file? I have an 11 line file of 4 digit numbers, I can add but am stuck on deleting.
def delete_file(string_in):
print("deleted_from_file")
with open('test.txt', 'a+') as f:
d = f.readlines()
f.seek(11)
for i in d:
if i != " item = {} ":
f.write(i)
f.close()
a+ mode means to write at the end of the file. So the seek has no effect, it always writes the lines after all the existing lines.
It would be easier to just open the file separately for reading and writing. Otherwise you also need to truncate the file after all the writes.
BTW, you don't need to use f.close() when you use with -- it automatically closes (that's the whole point).
The lines returned by readlines() end with newlines, you need to strip those off before comparing with the string.
def delete_file(string_in):
print("deleted_from_file")
with open('test.txt', 'r') as f:
d = f.readlines()
with open('test.txt', 'w') as f:
for i in d:
if i.rstrip('\n') != " item = {} ":
f.write(i)
You can store all the needed lines into a list using a list comprehension, and then write the lines into the file again after the file empties out:
def delete_file(string_in):
print("deleted_from_file")
with open('test.txt', 'r') as f:
d = [i for i in f if i.strip('\n') != " item = {} "]
with open('test.txt', 'w') as f:
f.write(''.join(d))
this is my first time here and i wish i get your help.
i'm new to python and i need your help
i have two .txt files
here an Example
file1.txt
customer1.com
customer2.com
customer3.com
customer4.com
customer5.com
customer6.com
customer7.com
customer8.com
customer9.com
file2.txt
service1
service2
service3
i want to loop the file2.txt on the file1.txt =>
like the following example
customer1.com/service1
customer1.com/service2
customer1.com/service3
customer2.com/service1
customer2.com/service2
customer2.com/service3
customer3.com/service1
customer3.com/service2
customer3.com/service3
AND GOES ON TILL file1.txt is done.
also i need to make IF statment
for example let's say the customer number3 has the service number 2 (file found i mean)
customer3.com/service2 [service found]
i need the loop for customer3 to stop looking for services and save the output (customer3.com/service2) in a new file called file3.txt
and the loop go on with other customers and every customer has the service found, the output save in file3.txt
i hope you understand what i mean.
thanks.
you could use itertools.product to get a cartesian product of the lines from each file to get every URL combination:
from itertools import product
with open("file1.txt") as f1, open("file2.txt") as f2, open(
"file3.txt", mode="w"
) as out:
for x, y in product(f1, f2):
out.write("%s/%s\n" % (x.strip(), y.strip()))
file3.txt
customer1.com/service1
customer1.com/service2
customer1.com/service3
customer2.com/service1
customer2.com/service2
customer2.com/service3
customer3.com/service1
customer3.com/service2
...
Loop task is easy. you need to read every file and to save data as list. then write a file with that loop order. See example. But I do not understand that black line and that service found logic. Is to general. Be more specific.
Example:
list1, list2 = [], []
with open("file1.txt", "r") as f1:
line = f1.readline()
while line:
line = line.strip()
list1.append(line)
line = f1.readline()
with open("file2.txt", "r") as f2:
line = f2.readline()
while line:
line = line.strip()
list2.append(line)
line = f2.readline()
with open("file3.txt", "w") as f3:
for i in list1:
for j in list2:
f3.write(f"{i}/{j}\n")
f3.write("\n") # just for that black line
Try this to read line by line and use the accordingly.
file1 = open('file1.txt', 'r')
file2 = open('file2.txt', 'r')
lines1 = file1.readlines()
lines2 = file2.readlines()
for line_from_1 in lines1:
for line_from_2 in lines2:
print(line_from_1 + '/' + line_from_2)
I am working on python program where the goal is to create a tool that takes the first word word from a file and put it beside another line in a different file.
This is the code snippet:
lines = open("x.txt", "r").readlines()
lines2 = open("c.txt", "r").readlines()
for line in lines:
r = line.split()
line1 = str(r[0])
for line2 in lines2:
l2 = line2
rn = open("b.txt", "r").read()
os = open("b.txt", "w").write(rn + line1+ "\t" + l2)
but it doesn't work correctly.
My question is that I want to make this tool to take the first word from a file, and put it beside a line in from another file for all lines in the file.
For example:
File: 1.txt :
hello there
hi there
File: 2.txt :
michal smith
takawa sama
I want the result to be :
Output:
hello michal smith
hi takaua sama
By using the zip function, you can loop through both simultaneously. Then you can pull the first word from your greeting and add it to the name to write to the file.
greetings = open("x.txt", "r").readlines()
names = open("c.txt", "r").readlines()
with open("b.txt", "w") as output_file:
for greeting, name in zip(greetings, names):
greeting = greeting.split(" ")[0]
output = "{0} {1}\n".format(greeting, name)
output_file.write(output)
Yes , like Tigerhawk indicated you want to use zip function, which combines elements from different iterables at the same index to create a list of tuples (each ith tuple having elements from ith index from each list).
Example code -
lines = open("x.txt", "r").readlines()
lines2 = open("c.txt", "r").readlines()
newlines = ["{} {}".format(x.split()[0] , y) for x, y in zip(lines,lines2)]
with open("b.txt", "w") as opfile:
opfile.write(newlines)
from itertools import *
with open('x.txt', 'r') as lines:
with open('c.txt', 'r') as lines2:
with open('b.txt', 'w') as result:
words = imap(lambda x: str(x.split()[0]), lines)
results = izip(words, lines2)
for word, line in results:
result_line = '{0} {1}'.format(word, line)
result.write(result_line)
This code will work without loading files into memory.